I see that a lot of people have been visiting this site because of Devin Pleuler’s article on the soccer Pythagorean expectation that we’ve developed. Welcome to Soccermetrics! We’re all about creating knowledge and insight from the data revolution that has overtaken soccer in the last few years. If you’re intrigued and want to dip more than a couple of toes in the water, please read on.Before the soccer Pythagorean was derived, a lot of people have attempted to apply the baseball Pythagorean directly to soccer. I haven’t written on its shortcomings in much detail, but in this post I’m going to go into the reasons why the original formula doesn’t work for football.Here is Bill James’ original Pythagorean formula:
- The baseball Pythagorean consistently underestimates point totals.
- The root-mean-square error of the estimation is very high.
To illustrate let’s apply the James’ Pythagorean to last season’s English Premier League.To obtain the Pythagorean exponent that best fits the expectation to reality, we take the difference between each team’s actual point total and their expected total, square that amount, repeat the process for all of the teams in the league and then add the values together. This is the league mean-square error of the Pythagorean and if you take the square root you get the league root-mean-square error or RMSE. We want to find the Pythagorean exponent that minimizes the league RMSE.We can find this exponent graphically by calculating the league RMSE over a range of values and plotting them. (There are more sophisticated ways to find the exponent, but the math is much more involved.) Such a plot looks like this:
|Team||GF||GA||Pts||Pythag (BJ)||Pythag (HH)||Pts – PyBJ||Pts – PyHH|
|West Bromwich Albion||45||52||47||52||46||-5||1|
|Queens Park Rangers||43||66||37||42||39||-5||-2|
The main feature of the Jamesian Pythagorean when applied to soccer is that it consistently overestimates point totals. There are some teams that persistently overperform in the Jamesian Pythagorean and the soccer Pythagorean, such as Newcastle, Stoke, and Wigan, and some teams identified as underperformers in the soccer Pythagorean have highly negative residuals in James’ Pythagorean. Manchester United’s performance — perceived as significantly overperforming according to the soccer Pythagorean — is in line with statistical expectations according to James’ expectation.It’s possible to argue that both expectations do quite well at identifying the significant outliers in a league competition. My perspective is that the inability to estimate the probability of draws in the Jamesian Pythagorean yields an estimator that has a persistently high RMSE and high level of bias. Neither characteristic is found in the Jamesian Pythagorean when applied to baseball, and I would hypothesize that it’s not present when applied to basketball, American football, or any other sport with no draws (or very few).So to conclude, Bill James’ Pythagorean expectation provides a lot of insight in baseball and has been adapted to other sports, but it fails in soccer. We’ve addressed the underlying assumptions of the original formula and developed a new metric that adapts those assumptions to football competitions. We like the results and we’re developing more metrics like it to enlighten our understanding of this great game.