Markov Chain Monte Carlo simulations for the 2010 WC

Way back at the beginning of this site, I linked to a page that simulated match results for the 2006 World Cup finals, based on a Markov Chain Monte Carlo simulation.  That same site is preparing a similar simulation for the 2010 World Cup.

I promised in that long-ago post to post an explanation of MCMC simulations, but I never did so.  It's time to make amends by giving an explanation now.

Monte Carlo simulations are relatively straightforward to explain; they are brute-force operations in which certain quantities in a simulation are varied randomly using a statistical distribution (usually Gaussian) and the operations are repeated many times.  "Many" can be hundreds, thousands, or even millions of operations, depending on the complexity of the simulations.  Monte Carlo simulations are useful for solving problems in which an exact analytic solution is difficult to find or does not exist.

A Markov chain is a discrete event whose time history (also called a process) has a Markov property, which states that future states only depend on the present time.  If you have a discrete process with a set of possible events at each time step, then what the Markov chain allows you to do is evaluate the probabilities that a process will end a certain way.  (This is a greatly simplified version, so if I don't have it exactly right, feel free to have at me in the comments.)

So what the MCMC simulation is doing in this instance is evaluating the win/loss/draw probability of match-ups in the group phase, then using a Markov chain definition to calculate the probability of each team advancing to the knockout stage.  The process is repeated for each ensuing match-up for the rest of the tournament, and the resulting probabilities are calculated to predict a winner.  The Monte Carlo simulation comes into play by repeating those previous actions many many times.

One downside of the simulation is that it depends heavily on getting the pairwise probabilities right.  The site in question uses "expert" analysis and previous head-to-head results, and we all know how right the "experts" are most of the time.  Head-to-head results could be equally useless to predict winners because of the lineups and the circumstances surrounding previous matches.  But there's not much of an alternative, anyway.  It would be interesting to see how pairwise probabilities get calculated; perhaps there could be an opportunity for the newly completed Soccer Pythagorean (which is nothing more than a win/draw probability estimate).

As the website said, the simulation won't do a very good job of predicting a winner, but it might be useful at developing a betting strategy to maximize profits during the tournament.  It won't stop the press from shouting "Computer simulation picks Spain/England/Brazil to win World Cup", of course.  But perhaps this simulation could provide some opportunities to win a bet or two when I go to Europe this summer.