As I said in my last post on my extended Pythagorean formula for soccer, I was in the process of compiling score data for the 2008-09 English Premier League season in order to test this formulation. The first step was to fit the data to a Weibull probability distribution in order to estimate the alpha term (which is proportional to the goal average) and gamma term (which is the shape parameter of the distribution).
I am using a nonlinear least-squares algorithm to estimate the two terms. To do this, I had to form a Jacobian matrix, which involves expressions for the derivatives of the probability distribution function with respect to alpha and gamma. The derivative with respect to alpha isn't all that complicated but it is tedious (Chain Rule has to be employed quite a bit). The derivative with respect to gamma is more complicated because it appears as an exponent. After some differentiation and then some algebra, I formed the Jacobian matrix and solved the problem. The nonlinear least-squares algorithm converges very well and requires less than 10 iterations to three-digit accuracy, or less than a second of real time. For a couple more seconds, it's possible to obtain an answer that's accurate to six digits. If enough people are interested, I'll write a description of my algorithm and post it here.
I only have data from five clubs, but the alpha terms are reasonable, and the average gamma term is close to my wag that I made to someone in a separate email. The spread in the gammas is fairly large, but that's to be expected with ten distributions. It should tighten when I have all 40 (goals scored and goals allowed for all 20 teams in the Premier League).
So hopefully by the end of the weekend I'll have an estimate of the exponent that best applies to the English Premier League, and then begin an estimate of the total points won over the 2009-2010 season.
UPDATE: I've completed my curve-fits for all 20 teams in the 2008-09 English Premier League season, goals scored and goals allowed. I'm going to write a more complete summary in a day or so, but here are the main results:
- The mean exponent term in the English Premier League (2008-09 season) is 1.58.
- The standard deviation of that exponent term is large — around 0.3. That number reflects the number of heavily-skewed goal distributions in attack and defense.
- There is a gamma function that doesn't cancel out in the draw probability derivations, but with the current exponent, the function is close to 1.
The next step is to write a script that computes the extended Pythagorean for this season based on current goal data.