I've reached the final step in my effort to test my extended Pythagorean formula, using the 2008-09 English Premier League as a test case. There was a last-minute snag in my plans, however. I was looking over my equations one more time, and I decided to re-derive everything from the beginning, which is a good thing to do after putting the original work aside for a couple of days. I realized that I had made an error with distributing the exponent term in the exponential terms. When I corrected it, I ended up with a much simpler expression for estimating draw probability, but the estimated point totals were worse! It took me a while to convince myself that I had done everything okay this time.
I included the derivation in my writeup of the soccer Pythagorean and rewrote key parts of it so that it is easier to read and follow. Your comments are always welcome.
So without any further ado, here are the estimated point totals for the 2008-09 Premiership using the extended Pythagorean alongside the actual point totals. I used the league Pythagorean exponent instead of the exponents for the individual teams. I've listed the teams in order of the final standings:
|| Est. Points
|| Final Points
The average difference between the predicted and actual points is -15.35 points (median -16), with a standard deviation of 4.48 points. That means that the estimated point total from the extended Pythagorean is accurate within 4-6 games, which is about the same level of accuracy of the basic Pythagorean. Over a 162-game baseball season, that difference may not be very great. Over a 38-match soccer season, however, that difference becomes quite significant.
The extended Pythagorean overestimates the number of points because it overestimates the number of wins. Again, that is consistent with the original Pythagorean estimate. It actually estimated Manchester United's win total within one match (and estimated their point total within five), but it predicted a tie between Chelsea and ManU on the basis of their identical goal-scoring records. Liverpool had a much better goalscoring record, which moved them into a tie for first place on estimate point totals. If we had been using individual Pythagorean exponents, Liverpool would have had a clear lead. So it's possible to conclude that Manchester United played to their expectations last season, while Chelsea and Liverpool did not at critical times.
I was disappointed that I didn't get more accurate results with the extended Pythagorean, but I am very impressed that it got the relative finishing order of the teams very close. There was a three-way tie among the teams that actually finished in the top three. Arsenal and Everton finished in the spots predicted. Two of the three relegated sides were estimated correctly. It was more difficult to get teams in the lower half of the table correct; those teams are separated by such a small point difference that any estimate of the point total becomes a wash. There were three or four glaring discrepancies, however. The first was Aston Villa predicted to finish 8th on 71 points (they finished 6th on 62), below Fulham and Manchester City. The second was Manchester City, who were predicted to finish 7th on 72 points yet finished in 10th place on 50 points. The prediction completely missed Newcastle United's relegation — they were predicted to finish on 56 points, which would have been good enough for 14th place. The extended Pythagorean also missed Bolton and Sunderland on place totals, but on relative table position it was much closer.
So according to the extended Pythagorean, Manchester United finished as champions because they played more closely to their expectations than any other team in the league last season. Liverpool and Chelsea were serious challengers to the title but fell off the pace at key moments. Aston Villa overachieved last season, despite their late-season swoon. Manchester City underachieved and should have finished higher. In the lower end of the table, Bolton and Portsmouth had disappointing seasons and had to deal with relegation worries for most of the season. The bottom two sides deserved to go down, Hull City were very lucky not to join them, and Newcastle United was the biggest disappointment of all the teams in the Premier League. So much was expected of them, yet they found themselves relegated to the Championship.
I think that's a fairly accurate summary of last season, isn't it?
It would be nice to come up with second-order or third-order adjustments to make the Pythagorean more accurate. The problem is that I have no idea what those adjustments would look like, nor have I thought of a way to define them mathematically.