Why the baseball Pythagorean doesn’t work for soccer

I see that a lot of people have been visiting this site because of Devin Pleuler’s article on the soccer Pythagorean expectation that we’ve developed.  Welcome to Soccermetrics!  We’re all about creating knowledge and insight from the data revolution that has overtaken soccer in the last few years.  If you’re intrigued and want to dip more than a couple of toes in the water, please read on.

Before the soccer Pythagorean was derived, a lot of people have attempted to apply the baseball Pythagorean directly to soccer.  I haven’t written on its shortcomings in much detail, but in this post I’m going to go into the reasons why the original formula doesn’t work for football.

Here is Bill James’ original Pythagorean formula:

This equation (or to be really technical, a model) relates win percentage in baseball to runs scored and runs allowed.  Now, Bill James originally used 2.0 as the exponent term that best fit the expected win percentages of all the teams in a league to their real values.  Selecting 2.0 as the exponent is convenient because it the formula looks like the famous Pythagorean formula and permits some visual insight (I’ll leave that as an exercise for the motivated reader).  Later sabermetrician showed that the best exponent to use is approximately 1.8, which speaks very well of James’ intuition.

Say you want to apply this formula to soccer.  Replace “runs” with “goals” and win percentage with points percentage (points earned / possible points), and then set the Pythagorean exponent to the Greek letter γ:

Multiply points percentage by total possible points (number of league matches by three points) and you get expected points.  It’s no surprise that so many have attempted to use James’ formula first.  It’s simple and intuitive to use and understand.  But when applied to football the baseball Pythagorean falls short in two ways:

  1. The baseball Pythagorean consistently underestimates point totals.
  2. The root-mean-square error of the estimation is very high.

To illustrate let’s apply the James’ Pythagorean to last season’s English Premier League.

To obtain the Pythagorean exponent that best fits the expectation to reality, we take the difference between each team’s actual point total and their expected total, square that amount, repeat the process for all of the teams in the league and then add the values together.  This is the league mean-square error of the Pythagorean and if you take the square root you get the league root-mean-square error or RMSE.  We want to find the Pythagorean exponent that minimizes the league RMSE.

We can find this exponent graphically by calculating the league RMSE over a range of values and plotting them.  (There are more sophisticated ways to find the exponent, but the math is much more involved.)  Such a plot looks like this:

The league RMSE bottoms out (reaches a minimum, to be precise) at around γ = 1.3, which we will call the league Pythagorean exponent.  As a quick aside, how does the league exponent change for different leagues?  Let’s look at last season’s Spanish La Liga:

Here the league RMSE reaches a minimum at γ = 1.2.  In fact, the league Pythagorean exponents for Bill James’ formula is between 1.1 and 1.4, admittedly over a small samples of leagues (I looked at last season’s Big Five European leagues and the just-concluded MLS regular season).

But take a look at both figures again.  The red line is the RMSE for the soccer Pythagorean that I derived, using the Pythagorean exponent that best fit expected points to reality over a large number of leagues.  (It’s our ‘universal’ Pythagorean exponent of 1.70.)  In both plots, the best-case RMSE from James’ Pythagorean formula applied to soccer is still larger than the RMSE of the soccer Pythagorean.  Over the leagues that I’ve plotted this finding persists.  Again, it’s a small sample space and I’ll leave an exhaustive study to someone else (I’ll happily link to the post that presents one), but I’m confident that even in the best case, the Jamesian Pythagorean has a consistently higher RMSE than my soccer Pythagorean.

So what do the estimated point totals look like?  Once again, let’s look at the English Premier League.  The table below presents goals scored/allowed by Premiership teams, with actual point totals, Pythagorean expectations (James’ and mine), and the resulting residuals.  The James Pythagorean uses an exponent of 1.3, and the soccer Pythagorean uses an exponent of 1.7.

Team GF GA Pts Pythag (BJ) Pythag (HH) Pts – PyBJ Pts – PyHH
Manchester City 93 29 89 93 88 -4 1
Manchester United 89 33 89 89 82 -0 7
Arsenal 74 49 70 72 68 -2 2
Tottenham Hotspur 66 41 69 74 69 -5 0
Newcastle United 56 51 65 60 55 5 10
Chelsea 65 46 64 70 63 -6 1
Everton 50 40 56 65 59 -9 -3
Liverpool 47 40 52 63 56 -11 -4
Fulham 48 51 52 55 49 -3 3
West Bromwich Albion 45 52 47 52 46 -5 1
Swansea City 44 51 47 52 46 -5 1
Norwich City 52 66 47 48 45 -1 2
Sunderland 45 46 45 56 50 -11 -5
Stoke City 36 53 45 43 41 2 4
Wigan Athletic 42 62 43 43 40 0 3
Aston Villa 37 53 38 44 41 -6 -3
Queens Park Rangers 43 66 37 42 39 -5 -2
Bolton Wanderers 46 77 36 39 35 -3 1
Blackburn Rovers 48 78 31 40 35 -9 -4
Wolverhampton Wanderers 40 82 25 32 29 -7 -4
RMSE 5.808 3.814

The main feature of the Jamesian Pythagorean when applied to soccer is that it consistently overestimates point totals.  There are some teams that persistently overperform in the Jamesian Pythagorean and the soccer Pythagorean, such as Newcastle, Stoke, and Wigan, and some teams identified as underperformers in the soccer Pythagorean have highly negative residuals in James’ Pythagorean.  Manchester United’s performance — perceived as significantly overperforming according to the soccer Pythagorean — is in line with statistical expectations according to James’ expectation.

It’s possible to argue that both expectations do quite well at identifying the significant outliers in a league competition.  My perspective is that the inability to estimate the probability of draws in the Jamesian Pythagorean yields an estimator that has a persistently high RMSE and high level of bias.  Neither characteristic is found in the Jamesian Pythagorean when applied to baseball, and I would hypothesize that it’s not present when applied to basketball, American football, or any other sport with no draws (or very few).

So to conclude, Bill James’ Pythagorean expectation provides a lot of insight in baseball and has been adapted to other sports, but it fails in soccer.  We’ve addressed the underlying assumptions of the original formula and developed a new metric that adapts those assumptions to football competitions.  We like the results and we’re developing more metrics like it to enlighten our understanding of this great game.

Share
  • Bryan

    It would be interesting to see if there are any other data that could be plugged into pythag expectation that be more effective at describing win percentage. In basketball, for example, Daryl Morrey found in basketball that points per possession worked more effectively for college basketball than margin of victory in the pythag expectation. I wonder if shots on target, time of possession, or something similar could be even more effective at explaining a teams end of season win percentage.

    • Howard Hamilton

      Bryan: That’s a good point, and it reflects a similar discussion on one of the basketball analytics forums (link here). One of the commenters said “…I think we’ve reached the limits of PPG-level analysis. To understand Pyth better, I think we’re going to have to look for other factors, maybe possession-based stats”. We’re at that point in football, but possession-based metrics are much harder to develop.

  • fred

    Doesn’t what you call the Jamesian Pythag show a bias because the individual goal environment within which each team played the season isn’t represented in the simple equation. Once you include goal environment in the exponent,the bias disappears,in turn greatly reducing the rmse.

    Any reason why you’re striving to reduce the rmse of your model.Shouldn’t you be testing your estimation of a teams true quality against something other than the actual result that you appear not to trust?

  • Lenny S.

    A bit late to finding this, but I think the biggest incompatibility is with the standings point system. I always thought that the NHL–when games ended in ties–got this right; since a tie is neither a win nor a loss, it was scored exactly halfway between: one point, versus two for a win. Using the Pythagorean wins formula for soccer isn’t going to come out right at all when a tie is 1/3 the value of a win. Pythagorean wins using the James exponent and counting ties as half-wins, however, looks much better.