The entrails from the 2014 FIFA World Cup have cooled, and the thoughts of football fans have turned to domestic competitions and the Champions League, but I finally got myself out from maintaining and updating the Soccermetrics API to analyze data from the World Cup. I start with an assessment of effective playing time from the tournament.
I wrote in my previous post about the challenge of calculating effective playing time with a data set that does not record the time and field location where the ball leaves the playing area. The major implication is that effective time can only be estimated instead of calculated, and I created a regression model between the number of time a ball leaves the field and the estimated time to be lost, which I then use to adjust the effective time. The resulting algorithm (a finite state machine) is more complicated because it must have memory of previous events — the throw-in, corner kick, or goal kick event and the event that occurs immediately before it. Finite state machines are hard enough to debug, but more so with memory.
So, word to the data suppliers: record ball-out events, and ensure that all of the spatiotemporal data is included in every technical/tactical event.
At any rate, nominal effective times are calculated using the finite state machine model and the values are then adjusted by the offset between expected time lost due to ball-out events and the time lost as calculated by the model. The uncertainty range in the regression coefficients are used to create upper and lower ranges of the estimated playing time.
Here are a couple of charts to present effective playing time on a match and team level. The first chart is the estimated effective playing time for every match of the World Cup over the first 90 minutes, with nominal playing time in descending order. Eight matches in the knockout stage were decided in extra time, but I only consider effective time over the regulation period.
|Group/KO||Round||Home Team||Away Team||Lower Range||Nominal||Upper Range|
|F||2||Nigeria||Bosnia and Herzegovina||69:31||72:26||74:44|
|F||1||Argentina||Bosnia and Herzegovina||63:33||67:32||70:36|
|F||3||Bosnia and Herzegovina||Iran||58:05||62:39||66:08|
Effective playing time in 2014 FIFA World Cup matches, with nominal, upper and lower range of estimated time shown. Data sourced from Press Association MatchStory feed and served from the Soccermetrics Connect API.
The second table is the average effective playing time for matches involving the 32 finalists.The amount of effective playing time is higher than one sees for domestic competitions, and I would imagine that it’s higher than previous World Cups. The average effective time comes out to exactly sixty minutes, which is about five to six minutes more than what I’ve seen for other competitions. Because it’s not possible to know exactly how much time has been lost due to ball-outs, I calculate upper and lower bounds, which puts most of the matches in ranges that appear to make sense to me. I don’t know what other people have calculated with respect to effective time, but I recall seeing a tweet from Opta that USA v Germany had close to 73 minutes effective time, which seems to correspond well with my estimate. All of the USA’s matches were in the top 12 in terms of effective playing time, which says something about the frenetic pace of play in their matches but not much about their possession characteristics. I am not surprised to see that Brazil vs Colombia had the least amount of effective time, but it’s when you dive into the match stoppage data that you realize how discontinuous and unpleasant that match was. Most of Brazil’s games were in the lower tier in terms of effective time, with the telling exceptions of the super-intense Brazil vs Chile Round of 16 match and the incomprehensible Brazil vs Germany semifinal. I was surprised that most of Colombia’s matches were in the bottom tier as well, with the exception of the matches against Uruguay and Greece (!!).
|Bosnia and Herzegovina||67:32|
Now, average effective time is a bit misleading in my opinion, and especially so in isolation, but I think it’s possible to gain some insight from the results. I am struck that three of the bottom four teams in the table are Group A sides, with the notable exception of Mexico. But Colombia being involved in matches with the least amount of effective time is very surprising to me. It’s a result that makes we want to learn more. And USA involved in matches with the most effective time? Well, this tournament forced a lot of observers to revise their conceptions of the US national team, maybe this is one example.
There’s a lot more analysis that one can do with effective playing time, but this is a good place to stop and pick it up later. In a future post I’ll create the same type of regression model that I created for Premier League and J-League matches to assess which team might have had greatest impact on effective time, as well as referee impacts, if any.