For the last few months I've been working on a soccer version of the Pythagorean formula. It's taken me into some interesting directions, and I've learned a lot about probability theory and reacquainted myself with the Weibull continuous distribution, which I haven't seen since an interesting summer internship in Florida almost 20 years ago. I thought I had made a breakthrough at some point, only to find out that the results that I was seeing didn't make any sense, and it was only after examining the calcuations that I realized that the problem setup was wrong. A man has to know his limits — his limits of integration, that is.
For those of you who haven't taken calculus, you might have heard of calculating the area under a curve or region. There are some mathematical tools that allow one to make this calculation in a simpler and more elegant way, and these tools form the concept of integration. The way all this concepts are applied to probability theory is that we have probability distribution functions expressed as curves on an two-dimensional plane. If we calculate the area underneath the region of the curve, we are calculating the probability of being within that region. Getting the limits of integration defined correctly makes all the difference in getting the probability calculation correct.
I've said a few times that you should not use a math model that you don't understand, whether simple or complicated. In my earlier adaptation of the baseball Pythagorean, I thought that I could use the existing expression for the probability of an outright win. It turns out that for soccer, because a drawn result is possible, the meaning of an outright win must change.
Say you have team X and team Y, and you want to calculate the probability of a draw between those teams. If you assume a continuous probability distribution, you can't use P(X=Y=c) because the probability is zero (infinite number of points in a continuous distribution). One way to work around this is to assume that either team can score decimal number of goals (2.83 for example – impossible of course, but useful as a math construct) and then calculate the probability that both teams will score goals within a half-goal region centered at a whole number (0,1,2,3,…). In the figure below, the grey squares represent regions where the goals scored by team X and team Y lie in the same region.
The other regions are those where the difference in goals scored by either team is greater than half a goal — an outright win. If team X has scored 2 goals, for example, the probability of a win by X is the probability that team Y has scored less than 1.5 goals. If team X has scored 0 goals, the probability of a win by X is also zero, which makes perfect sense from the figure. It's difficult to come up with limits of integration for that entire region underneath the grey regions, but a much easier way to do this is to subdivide the region into rectangular regions centered on those whole numbers and conduct the integration in those regions. It makes for a much easier integration, but results in a chain of exponential terms that have to be summed. Now you could express the sum of those exponential terms as an integral (that squiggly symbol is a stretched-out S which stand for "sum"), but it's not possible to simplify all of the terms further.
Now if I haven't put you to sleep and/or totally bored you by now, I've reached the following conclusion after all this time: there is NO such thing as a simple Soccer Pythagorean expression. It might be possible to create a simplified Soccer Pythagorean expression, and such expressions might work just fine and be close enough for government work. But a simple Soccer Pythagorean formula, derived from first principles in the same fashion as the Baseball Pythagorean, does not exist.
As I just said, here are some simplified Pythagoreans that appear to be close enough. I'll discuss one of them tomorrow. (UPDATE: Or maybe not. I misread the email sent to me a few days ago.)