Update on the Soccer Pythagorean derivation

** BUMPED to the top with updates added **

I want to give an update on my efforts to derive a Pythagorean (or Pythagorean-like) formula for use in soccer.  As I said in other posts, the most challenging part is coming up with the term that captures the probability of a drawn result.  This is rather difficult when you assume that the goals follow a continuous probability distribution, because the probability that teams X and Y will have the same number of goals is zero for a continuous distribution.  I tried to work around this by assuming a Poisson distribution and working out the probability of a drawn result, but I end up with some nasty infinite sums that can't be simplified.  Also, there aren't any parameters that can be changed in a Poisson distribution:

P(i) = P(lambda,i)

and that lambda term is the average number of goals during the season. 

I was looking through some of the papers that I have on soccer goal distributions, and I found that some researchers use an extreme value distribution to model goal distributions during a season.  As a matter of fact, I discussed one of these papers on this blog a few months ago.  Extreme value distributions are flexible like Weibull distributions and can be used to handle extreme events in soccer, like your 6-0 or 13-1 results.  I'm thinking that that distribution might be a better one to use.  Perhaps I could also look at the probability of teams X and Y scoring goals between a continuous interval like [-0.5, 0.5] or [2.5, 3.5] (where the actual goals scored is in the center of the interval) in order to come up with a draw probability.

I have a midterm in my course this week, but I hope to have something to present on here soon.

(Oh, and an explanation of one of the math terms: the notations [a,b] and (a,b) in mathematics are used to represent an interval between a and b.  The square brackets mean that a and b are included in the interval; the round brackets mean that a and b are not included.  Infinity can never be reached, so there is always a round bracket on that end: (-∞, b] or [a, +∞). )

UPDATE (18 Oct, 9:20pm): I made a breakthrough in my derivation of a "Pythagorean" for soccer teams and leagues.  I kept the Weibull distribution and solved for the probability that teams X and Y scored goals between a continuous interval.  The actual number of goals is a discrete number that lies at the center of the interval.  I got an expression for the probability of a n-score draw, where n is the number of goals, which you can sum up to whatever number of goals you wish.  Unfortunately the resulting expression is much more complicated than the one that predicts a clear win, and includes some special functions that were canceled out in the first term.

I'll write a separate post that presents the complete formula and my derivation in an attached PDF.

UPDATE #2 (19 Oct, 11:30pm): I forgot to move this post to the top.  And I won't be able to write that post tonight.  I had too much going on; sorry.


Tags: , , , , ,