# Explaining the Soccer Pythagorean, in English this time

In the wake of recent conversations with people on and off this site, I realize that I really need to give a better explanation of the soccer Pythagorean formula than I have done previously.  So in the following lines I hope to give a clear explanation that's understandable to everyone and 95% math free.  (The 5% will still be understandable to everyone, even journalists!)  If there's still confusion, please let me know — whether it's via the comments, email, or my Twitter account (@socmetrics).

In its most fundamental sense, the Pythagorean formula claims to be able to estimate the number of games a team will win if we know the average number of points, goals, or runs that teams scores or allows per game.  We do this by assuming that the scoring offense and defense are distributed in the same way for every team in the league.  If that assumption holds up — and from baseball's experience it holds up pretty well — than all we need to know are the average scoring offense/defense of a team in order to determine its win percentage.  (We then multiply by the number of games to get the estimated number of wins.)

For soccer, all of the above applies except some additional wrinkles: we must account for draws and for the unique characteristics of domestic leagues around the world.  I've looked at a number of domestic leagues in Europe, Asia, and North America, and it's really neat to see that the distribution of scoring offense and defense in these domestic leagues don't change very much.  When I mean that the distribution doesn't change very much, I mean that its shape stays about the same.  That shape is represented by the Pythagorean exponent.

To account for win and draw percentage, we have to assume that we can score a decimal number of goals.  Of course we can't score 2.35 goals any more than we can have 2.35 children, but this assumption is useful for calculating percentages.  (We also assume we can score -0.5 goal, which is impossible, but useful in making the mathematics easier.)  We define a drawn match as one in which the goals scored by team X and team Y are less than half a goal apart.  If the difference is greater than half a goal, then that result is defined as a win for either team.  So we compute the win percentage by answering the following question:  If team X has scored c goals in a match, what is the probability that team X has allowed half a goal less than that?  We compute the draw percentage by answering this question: If team X has scored c goals, what is the probability that it has allowed up to half a goal more or less?  If you consider situations where team X has scored anywhere from 0 to as many goals as you wish, and then sum up all those individual probabilities, you get the total win and draw probabilities for that team.  You can use those probabilities to estimate the number of points won per game and then the total number of points at any point of the season.

So in conclusion, if we assume that the offensive and defensive scoring in a league are distributed the same way, and we know the number of matches played and the number of goals scored and allowed by a team, we can use the soccer Pythagorean formula to estimate its win percentage, its draw percentage, and the total number of points won.  What is really neat about this formula is that it gets quite close to actual totals in most cases.