Weston, M. et. al. (2006) "The effect of match standard and referee experience upon the objective and subjective match workload of English Premier League referees", Journal of Science and Medicine in Sport, 9 (3): 256-262. [Citation|PDF]
The above four papers present an overview of the current state-of-the-art in referee analytics in football over the past five years. Physiological and technical performance data were obtained by either in-house measurement or professional data collection equipment in order to obtain a picture of the physical and mental demands on central and assistant referees. The major findings were that while physiological performance variables are strongly dependent on age, variables corresponding to cognitive or spatial performance are more dependent on match experience, particularly in high-intensity matches.
Referee performance does not attract as much attention as player performance in the soccer analytics community, but it is equally important because of the refereeing team's impact on the outcome of the match. If in-match player performance data are difficult to obtain, in-match referee data are doubly so. It's not difficult to obtain a record of yellow and red cards for every match a central referee has officiated, and in fact there are websites devoted to just that. But the finely-grained data that clubs and the betting community would love to have are seen as highly sensitive information by the leagues and are tightly guarded.
As a result, publications on referee analytics in soccer are more difficult to come by. Nevertheless, I managed to find four papers that I believe encompass the state-of-the-art in research on referee physiological and technical performance over the last five years. Three were written by a researcher who was affiliated with the English Premier League for several years, and the fourth was written by a team of Spanish researchers with significant input from a former FIFA referee and the head of FIFA's Referees Committee. All had approval to collect data and publish their work from competition officials — Weston from the Premier League, Mallo from FIFA. As you might expect, central referees get much of the attention, but the Spanish publication focuses on the performance of assistant referees.
Data are collected over a long period of time and come from a variety of sources: Prozone, which supply the best data for referee performance analysis in my opinion, in-house video analysis software, heart-monitoring equipment, and perceived exertion ratings (RPE). Weston and Mallo use heart-monitoring equipment to measure the referees' heart-rate during the game, but Weston uses the RPE to assess the referees' level of physical exertion over the entire match. The RPE is a subjective measurement and one has to be careful to use it correctly, but the referees in the study were familiar with it from their training programs. The most detailed data came from Prozone equipment which provided the following:
- total distance covered (TD)
- high-intensity running distance, speed > 3.5 m/s (HIR)
- number of sprints, speed > 7.0 m/s (SC)
- top sprint speed (TS)
- mean distance from ball (DB)
- mean distance from foul (DF)
Weston and his team use Prozone data; Mallo and his team use in-house video analysis software to digitize images and develop position and velocity data. Weston's studies cover the 2003-04 to the 2007-08 Premier League seasons, while Mallo's study focuses on the 2003 FIFA U-17 (Boys) World Cup and the 2005 FIFA Confederations Cup. The research methodologies don't change much between the studies, a combination of significance testing (Student's t-test and/or ANOVA tests) and correlation analysis between two dependent variables. It's not all that complicated, but the primary challenge is that the noise in the data make definitive statements difficult. More on this later.
The main results of the studies were the following:
- As the match standard goes up, the physical demands on the referees go up. Weston's 2006 paper presented data on mean heart rate and perceived exertion as a function of match standard, and referees who worked matches in both the Premier League and Football League Championship had their vitals measured. His group showed that there was a significant difference between mean heart rate and perceived exertion and the standard of play. Moreover, perceived exertion and heart rate were well-correlated, which indicates that the combination of those variables could serve as a proxy for match intensity. Mallo's study of the assistant referees showed a similar level of significance in FIFA competitions between the level of competition and the amount of high-distance running by the linesmen.
- Experience doesn't matter in terms of physical exertion. In contrast to match standard, Weston et al. showed that the level of referee experience doesn't create a significant difference to heart rate and perceived exertion. The implication of that result is that the amount of matches refereed don't make a referee more able to handle the demands of top-level football; it's the referee's level of physical fitness that is more important.
- The level of physical exertion drops off significantly in the second half. This result is no surprise, as physiological studies of football players reveal a similar drop-off in physical performance. It is especially true in the opening fifteen minutes after the restart as players' (and referees') leg muscles try to warm-up. The result has implications for in-match fitness of the referees; perhaps they should spend time stretching or on a stationary bike to maintain their ability to physically perform in the opening minutes of the second half.
- Older referees may not be able to run, but they know where to stand. Just like an older and more experienced defender who knows where to be to intercept a pass, experienced referees are able to anticipate plays and stand in the best position to adjudicate a foul. Weston's 2010 paper reported a negative correlation between age and the physical quantities that indicate exertion, such as total distance, high-intensity running distance, and number of sprints. However there is almost no correlation between age and either mean distance from the ball, mean distance from the foul, or mean heart rate. Such data appear to say that while referees who approach the FIFA-mandated retirement age may not be able to cover the same distances as younger referees, their cognitive ability coupled with their experience remains strong.
- Referee data remain very noisy. Weston's 2011 paper expanded the study of the previous year's paper by an extra season and considered the match-to-match variation in a referee's physical and cognitive performance. The data show that each match is different — players approach each match differently on a tactical, physical, or emotional level, and referees' demands change accordingly. Generally, if the central players are busy, the referee will be busy as well. While data such as distance from the ball or foul have a low level of variance relative to their means, other physical data such as distance run and high-speed running have a very high coefficient of variation.
The implication of such a result is that if you want to say anything meaningful about the significance of the distance measurements (say 10% — which is barely statistically significant), you will need a sample size of almost 200 referees. That's essentially a 20-year study if you only look at Premier League referees, and if you expand the pool to referees in lower divisions you must then adjust for playing standard. So, significance testing will reveal very little except for a few variables such as referee positioning on fouls.
Most of the studies remain physiological in nature, and there remain few published studies on the relationship between age, experience, and the ability to make the correct decision, not just be in the right place to make a decision. Such a study would require the input of referee assessors at the games, and such data would highly protected by the league. Nor has there been a study on the types of fouls that referees are more likely to signal, and whether certain fouls are whistled more in a certain season (points of emphasis by the competition committee). Again, those results would also be classified.
The four papers are worth reading to obtain an idea of the current state of referee analytics in football, and there are several references within the papers that merit further consideration. There are more sophisticated data on referee performance than ever before, and analysis will continue to be developed on referees as on players. It's possible to do a regression analysis of the fouls, yellow and red cards issued by a referee during a match as a function of other variables, but the main analysis question is whether the coefficient of variation between matches will be low enough to say anything meaningful. In the end, the sensitivity of the finely-grained technical data will mean that advanced referee analytics will remain confined to either domestic league competitions or continental and global governing bodies.