There is one remaining study to complete with respect to the Pythagorean estimation before I wrap a bow around it and concentrate on something else. Last weekend in San Francisco I touched briefly on the effect of offensive/defensive goal variances on the difference between the Pythagorean estimate and the actual point total — the Pythagorean residual, if you will. I wanted to determine if there was a relationship between the variances and the residual in the Pythagorean estimate.
The rationale was that if the variances are low, then the corresponding standard distributions are also low, which means that a team's goalscoring becomes more consistent. If two teams have identical goalscoring records, the team that scores more consistently (i.e. has a lower offensive variance) should have more league points than expected. But teams also have to play defense, so there is most likely some kind of nonlinear relationship.
I went back through all of the national leagues that I was evaluating — almost 40 in all — and extracted team goalscoring variances and Pythagorean residuals. Then I plotted multiple overlaid scatter plots of offensive and defensive variances, color-coded by the absolute value of the Pythagorean residual.
Below are two plots. The first one plots the offensive and defensive goal variances with Pythagorean residuals greater than zero. That is, all of these clubs played at or above their statistical expectations.
The second plot is of the offensive and defensive goal variances with Pythagorean residuals less than zero. That is, all of these clubs played at or below their statistical expectations.
From these two figures it is not apparent how the goal variances correspond with team performance relative to Pythagorean expectation. It doesn't look like there is much correlation present between the three sets of quantities. And when I think about it some more, that actually makes some sense. The Pythagorean is essentially an estimator of behavior in a league, and we're estimating the league Pythagorean exponent in the presence of uncertainties. If the estimate is good, then the resulting residuals should be uncorrelated noise with a few spikes that correspond to strong over- or under-performance.
It would be interesting to find out (a) whether the residuals really are uncorrelated, and (b) how the Pythagorean fits in within the realm of estimation theory. But those questions are worthy of at least a Master's thesis and/or a SIAM journal article, and I have no desire to do either.
Perhaps goalscoring variances and actual league points (averaged per game) would be a better thing to look at.