Maybe goalscoring is Poisson

I see that Chris Anderson has been having fun with nice-looking scatter plots lately, so why let him have all that to himself?

I've had my doubts that you could describe goal distributions as Poisson, at least when it comes to deriving derivative expressions from them.  A formulation of the soccer Pythagorean doesn't work with a one-parameter Poisson distribution.  And besides, it's rare for the expected goals to be identical to the variance, which is the central assumption of the Poisson distribution.  So the Poisson distribution of soccer doesn't fit — right?

Well, I have been looking at goal scoring data from the various European leagues that I used in my Pythagorean study, and I've been plotting means and variances of the goals scored over the course of a season.  Below is such a plot from France's Ligue 1 (2009-10 season), with a V sketched to indicate the line where the mean and variance are identical — the Poisson distribution line.  Positive means indicate goals scored, and negative means indicate goals allowed — variances are always positive.


The means and variances don't match perfectly to the Poissonian ideal in the actual goal distribution data.  But they're close enough.  From visual inspection it looks like roughly half of the teams have goal statistics on either side of the Poisson line, but I need to do a more formal analysis to make sure.

There are some more interesting results when one color-codes the circles for teams at the very top or bottom of the table.  But that will wait for a later post.

The data also indicate that perhaps a two-parameter Poisson distribution would make for a better goalscoring model and be more tractable for other calculations.  It would be worth studying.