Bittner E., Nussbaumer A., Janke W. and Weigel M., "Self-affirmation model for football goal distributions", Europhysics Letters, 78 (2007), 58002. [PDF]
Does scoring a goal in a football match improve the probability of scoring another one? In this paper the authors augment a statistical model with a feedback term called "self-affirmation" that reflects the (de)motivation of teams after a goal has been scored in a match. The model is applied to league data over a period of several decades and is found to capture the goal distributions very well, even in "heavy-tailed" segments of the distributions that reflect large numbers of goals being scored. The researchers also found that the level of self-affirmation varies significantly with the type of league and the concentration of high-quality teams in those competitions.
Probability distributions of correlated events, with fat tails that indicate that extreme occurrences aren't all that uncommon, appear throughout nature. They can be found in several problems in statistical mechanics, such as turbulence and seismic activity. These distributions can also be found in sport, particularly when it comes to scoring goals in football. It's a combination of their research interests and their love for football that motivates a group of German theoretical physicists to apply their theoretical machinery to understanding goal distributions in football, in the hope that any findings would provide insight on other statistical problems in physics.
Goal-scoring models take various forms. The most basic one is a Poisson distribution, which is a discrete probability distribution that models the probability of a number of events occurring in an interval. This kind of expression makes sense in football where one can score one, two, or more goals, but not 1.7 goals. Assuming that n is the number of goals scored, the Poisson probability distribution of the number of goals scored in a match is:
where λ = <n> and represents the mean number of goals scored. Now, if this distribution is repeated over a number of seasons, we approach a compound Poisson distribution. If λ follows a gamma distribution f(λ) — a special case — the compound distribution approaches (as mathematicians say, "in the limit") a negative binomial distribution:
with two parameters r > 0, and 0 < p < 1. This distribution actually models score data very well, and several researchers have shown that over a number of seasons, the number of goals scored approximates this distribution. However, the average number of goals — the λ term — has not been found to follow a gamma distribution. As an alternative, the generalized extreme value distribution is used:
This model is a better fit to fat-tailed distributions, at the extremes where heavy scorelines occur.
In summary, goals in soccer matches don't follow Poisson distributions very well over the course of several seasons, but they do appear to follow negative binomial and generalized extreme value distributions, the latter being more likely to capture extreme events (your 8-1 or 10-0 results, or even 5-0). But even those events are difficult to capture correctly, and that is something that this paper addresses.
The premise of the authors is that goals in soccer are not independent events; rather, scoring goals gives a positive feedback to the team that scores them. Conversely, there's a demotivating effect upon the team that has been scored against. I've watched several matches where I felt that if one team managed to score a goal, they would find it easier to score another or two more. This psychological effect on teams might account for heavier-tailed distributions in some leagues, and it is this effect that Bittner et al. attempt to capture. They do so by creating a self-affirmation factor κ, which is either added or multiplied to the goal-scoring probability after a goal has been scored. The researchers consider two models:
Model A (an additive model):
p(n) = p(n-1) + κ
and Model B (a multiplicative model):
p(n) = κp(n-1)
Bittner et al. rewrote the binomial and extreme value probability distributions in terms of the above expressions using a Pascal recurrence relation, which uses the previous numbers in a sequence to calculate the new numbers:
PT(n) = [1-p(n)]PT-1(n) + p(n-1)PT-1(n-1)
where T = 1,…,90, the number of minutes in a football match.
To test these models, the researchers used match result data from the East German Oberliga, (West) German Bundesliga, German Frauen-Bundesliga (women's league), and World Cup qualifiers — a total of over 20,000 matches over the last 70 years. The probability density functions (pdf) of the leagues and qualifiers were determined from the data, and the distributions were fitted to the pdf's.
The self-affirmation variable has a significant effect on the model fitting. As expected, the Poisson distribution has the poorest fit as reflected in the high chi-square value. The negative binomial distribution's fit is much better, and the generalized extreme value distribution works well, especially at the extreme end of the pdf. Self-affirmation, or positive feedback, appears to have a multiplicative effect on the goal scoring probability, and it also appears to be more pronounced in leagues that are either less professional or have large concentrations of talent among a few teams in the league. It is in those two cases that the inclusion of the self-affirmation variable allows the probability distribution to have a better fit at the extreme end of the distribution. Positive feedback is not as big a factor in the Bundesliga, but it appears to be more significant in the women's league and also at the World Cup qualifiers — two competitions where there can be a big disparity between competing sides.
It's a neat paper, short but packed with tons of geeky goodness, like all of the physics journal papers. I found it really neat how a simple parameter, admittedly crude, can capture what's going on at the extreme end of the goal scoring distribution. As the authors said, the next step is to find out how self-affirmation affects the probability of goals scored during a match, which would require access to not just the final scores but also the time of scoring. The concepts could provide insight into statistical problems in physics, but the results could also be useful for the betting houses.
ADDENDUM: This paper has generated a lot of international attention, and the website of one of the authors (scroll down to "Football Fever") has a nice summary of all that attention.