Last week I had the privilege of attending the OptaPro Analytics Forum in London. It was organized by OptaPro, Opta Sports’ professional services organization, with the objective of bringing together in one room a select group of football analytics researchers and performance analysts and other officials from the professional football clubs which use Opta data services. The Senate House at Birkbeck, University of London was the venue for the day-long conference, and approximately 50 people attended.
A quick summary of the Analytics Forum: in November OptaPro made a Call For Proposals of research projects that would make use of its technical/tactical data sets. (Researchers could use their own data sets as well but this was not required.) A panel of judges — Ian Graham, Blake Wooster, Chris Anderson, Sam Green, and Devin Pleuler — evaluated the proposals using a blind process and a final group would be given access to Opta data and invitations to present their work. As it were, almost 60 proposals were evaluated and nine projects were selected. The time between final selection and the presentations was six weeks, and Opta’s F24 data feed is not the easiest to parse, so I feared that there wouldn’t be much to present for many of the presenters.
Fortunately, my fears were misplaced. The presenters, to a person, took pains to point out that their work was very much in progress and results were preliminary; nevertheless, the content of the presentations was of excellent quality. (The quality of the presentations themselves was all over the map, but in general analytical people tend to make their presentations more functional than beautiful.)
In my opinion, these were the main themes of the conference:
- The start of game models in football. By a ‘game model’ I mean the creation of a framework that accepts in-match data and simulates expectations of player and team performance . Work in this area has progressed in fits and starts over the last five years, and at this conference a research group from Onside Analysis presented early results on a ‘multi-agent simulation engine’ that was a predictive simulation from in-match events. Dr David Hastie, who presented the research, emphasized that the work was at an early stage, but they used the model to answer questions about the impact of certain player transfers on the outcome of several teams in the English Premier League (Robin van Persie was one example). The results may be preliminary but the model appears to enable formulation of interesting questions in the sport.
- The introduction of player similarity. I remember being asked at the 2010 NCSSORS if there was anything resembling a PECOTA in soccer, and I replied that such a system was at least 5-10 years away. We may be a few years away from a full-on career prediction model, but we saw a couple of presentations that developed similarity models from in-match data. Jonathan Gruhl presented similarity scores that were partially calculated by taking into account in-match behaviors, which combined with summary statistics not only compares players on performance but also playing styles and characteristics. Marek Kwiatkowski focused on classification of central midfielders by considering the type and frequency of passing behaviors, and identified roughly the same group of similar midfielders as Gruhl.
- Usefulness of subjective data. A lot of researchers are wary of subjective data because of the lack of standard processes in collecting those metrics. But there were a couple of presentations that made use of subjective data which demonstrated that they were valuable in illuminating areas of player and team performance. David Hastie made a great point that there is indeed a lot of value in subjective data if one goes about collecting it the right way. There are perhaps one or two conditionals too many in that previous sentence, but he does make a good point. Ben Woolcock used Opta’s Big Chances statistic — a metric that has some resistance even at Opta headquarters! — to demonstrate the effectiveness of teams in converting chances that were lightly defended versus those that were closely guarded. I expected not to like the presentation but at the end I thought that the application of that metric was quite clever.
- Appearance of statistical rigor. What I liked about the talks was that most of the presenters were serious about statistical rigor at some level. That’s not to say that there were proofs, but the researchers were open and precise about the processes they used to arrive at their results.
- The unease of the professional football analysts. I don’t think ‘unease’ is the right word, but I’m struggling to come up with the best word to describe their reactions during the Q&A periods. It appeared that many of the club analysts were interested in the research projects, yet reluctant to pose specific questions or recommendations for fear for telegraphing valuable information to their competitors. In that respect, Manchester City analyst Pedro Marques’ presentation on modeling football actions as a social network was most impressive. Not only did he communicate what is possible through Manchester City’s use of data analytics and interaction with outside firms, he presented this information without fear. I think this presentation should have been given first.
- Manchester United were very lucky last season. Okay, this isn’t really news, but I found it humorous that a variety of studies at the conference, coming from multiple directions, concluded that Man U overachieved significantly in 2012-13. I noticed that they significantly overperformed in the soccer Pythagorean table over the last two seasons.
- Assessing player performance and tactics using a multi-agent football simulation engine (Hastie, Edwards, Eastwood)
- Similarity scores using Opta data (Gruhl)
- If you don’t buy a ticket, can you still win the raffle? (Page) A taxonomy of shot-making in football.
- Can we measure how much more teams leave themselves open at the back when chasing down a lead, and which teams tend to shell when they go a goal up? (Woolcock) A study of creative and defensive efficiencies as a function of game state (scoreline relative to home team)
You might be hearing about some of them on future Soccermetrics Podcasts.
In my view, the OptaPro Analytics Forum is the best soccer analytics conference since the 2011 NESSIS. It’s the kind of event that I’ve been waiting to see, and it’s significant that it was held in London. It was a pleasure to meet those of you who attended the conference at University of London, and I really hope that it becomes an annual event.