My work on the soccer Pythagorean is almost done, but there were a couple of trade studies that I wanted to complete before I tied a bow around everything. There were two questions that I wanted to address:
- Is there a "universal" Pythagorean league exponent, one that applies over all leagues?
- How do league Pythagorean exponents change over seasons?
I define the league Pythagorean exponent as the (arithmetic) mean of the Pythagorean exponent for all the teams in a league. There is a standard deviation associated with the exponent as well, which tends to be between 10-30% of the mean.
To address the first question, I collected league result data from a variety of top-flight national leagues from Europe, Asia, Africa, and the Americas. The result data were used to develop goal histograms for all the teams in the league and then fit those histograms to the assumed distribution (and then extract the Pythagorean exponent). For the most part I considered leagues with a double round-robin format, but I also used data from leagues with split-seasons (Central and South American leagues), triple- or quadruple-round-robin formats (Switzerland and Austria), or unbalanced schedules (MLS until this season). I looked at regular season data only, and I focused on the 2009-10 European league season (or 2009 season).
Here's the list:
Europe: 20 leagues (Austria, Belgium, England, Finland, France, Germany, Greece, Israel, Italy, Netherlands, Norway, Poland, Portugal, Romania, Russia, Spain, Sweden, Switzerland, Turkey, Ukraine)
Asia: 6 leagues (Saudi Arabia, China, Iran, Qatar, Japan, South Korea)
Africa: 3 leagues (Egypt, South Africa, Tunisia)
North America: 7 leagues (USA, Mexico, Honduras, Costa Rica, Guatemala, El Salvador, Panama)
South America: 5 leagues (Brazil, Argentina, Chile, Uruguay, Venezuela)
The result is a chart of the league Pythagorean exponents, grouped by confederation.
There is some oscillation in the league Pythagorean exponent, but the values remain within a somewhat narrow region. I calculated the mean value of the league exponent. The result: about 1.66, which is close to the 1.70 value that I've been using. It's not a definitive proof, of course, and there will always be uncertainty involved with the league exponent, but it is plausible that a Pythagorean exponent of 1.70 would work for the majority of domestic leagues.
To address the second question, I collected league data for the English Premier League going back to the 1999-2000 season and Japan's J.League Division 1 going back to 2005, when they instituted a double round-robin competition for the first time.
Below is a plot showing how the league Pythagorean exponent has varied in the English Premier League over the past 10 seasons:
You can see some oscillation in the Pythagorean exponent and a hint of a drift, but the mean values remain in the 1.55-1.75 range. It would be really neat to find out if that oscillatory behavior existed 10-20 seasons ago, but I'll leave that exercise to someone else.
And here is a similar figure for the J.League after five seasons:
It's too early to identify any oscillatory behavior, but it does appear that the league Pythagorean exponent evolves slowly. So even if the "universal" exponent changes, any bias in the projections won't be apparent for several seasons.
In conclusion, there does appear to be a single Pythagorean exponent that works well for almost all domestic leagues around the world. Whether it is universal would be tough to prove, but an exponent between 1.60-1.75 is sufficient to provide a good estimation of point totals. And the league Pythagorean exponents appear to be well-behaved over a number of seasons.