Trinomial trees, Markov chains, and qualifying odds for the World Cup

I have a couple of posts in the pipeline for my other website that explore a hypothetical merger between CONCACAF and CONMEBOL, and I started thinking about the probability of Caribbean countries qualifying for the World Cup from CONCACAF and how that might change in a larger 50-team region.  (With 30 associations, the Caribbean countries would maintain the balance of power in a merged American confederation.)  It got me to thinking, just what is the probability of any national team qualifying for the World Cup?

An exact answer doesn't exist, but I think the answer could be approximated by looking at the problem as a trinomial tree.  A trinomial tree is a computational tool for pricing options, but at this point I'm more interested in the conceptual model.  A soccer team has three possible match results: win, lose, or draw.  (Putting aside complicating factors like away goals rule or penalty kick shootouts.)  Each match has those three possible outcomes, so it's possible to visualize a team's path through a competition by the tree below:

Trinomial_tree
That's after two games, and there are nine possible paths that a team can take.  After three matches (say, group play at the World Cup), there are 27.  In fact, there are 3n possible paths for a team playing n games.

So how might you use this to calculate qualification probabilities?  I'm still thinking my way through the process, but you would first have to come up with the probability of winning a match against a given opponent.  Perhaps you could use a ranking system (ELO, SPI, even FIFA) to derive win/loss/draw probabilities against an opponent when playing home or away.  There are certain point totals that guarantee defeat in a two-match series (0 or 1 points), as well as point totals that guarantee success (4 or 6).  Point totals of 2 or 3 could result in a win by aggregate goals, the away goals rule, or the penalty kick tiebreaker, which makes any odds calculations complicated in a hurry.  If you put those tiebreakers aside for a moment, it's possible to model the series result probabilities as a Markov chain, which is a useful tool for modeling discrete processes where the state of the process at a future step depends only on the state at the current step.  There are separate Markov chains during the qualification process in CONCACAF: two two-match series (two for the bottom 22 teams, one for everyone else), one six-match series, and one ten-match series.

As I said, this can get complicated very quickly, and I know that I need to flesh out all the details.  It has the makings of a very intriguing problem — as if I don't have enough to do already.

Share