Eric Vormelker hijacked my celebratory post with an article from Brian Phillips in Slate that questioned the move toward analytics in soccer. (Quite all right, Eric!) The article hits on points that I’ve made in previous posts on soccer analytics. Being self-referential in blog posts is bad form, so I’ll just point you to the posts that I’ve made in the High Level Discussions category. But perhaps it’s best to summarize previous comments in a single post, and I’ll incorporate those comments in my response to the article.
I have learned over the years that journalists and numbers do not get along very well; the former are uncomfortable with working with and talking in numbers, and it shows in too many articles. This article isn’t quite so bad — there is actually much to recommend about it as a corrective to overly-optimistic talk about sports analytics — but some spots illustrate the unease that lots of journalists feel toward anything numerically inclined. Phillips’ objections to the growth of soccer analytics can be condensed to three arguments: soccer is too complicated to be explained numerically, there is very little agreement on assumptions used to generate metrics, and the proprietary nature of match data and metrics prevents the growth of grassroots analysts.
Phillips makes an excellent point about soccer in that it is a simple game that is actually quite complex. He writes
“There’s just one problem with the sport’s newfound sophistication, which is that soccer happens to be a quaint, starry-eyed endeavor that can’t be explained by the numbers…”
I can’t find much to disagree about that statement. But I’d amend the last part slightly:
“There’s just one problem with the sport’s newfound sophistication, which is that soccer happens to be a quaint, starry-eyed endeavor that can’t be explained fully by the numbers…”
which I think makes a more accurate statement. Soccer is a simple game of great complexity, sublime beauty, and sheer unpredictability, yet a game with discernable patterns. (When was the last time you watched a match and said that this is looking like one of those typical 0-0 games, or 2-0 game with the second goal on a late breakaway?) It is possible to describe some aspects of the game in the language of mathematics, and perhaps develop some subtle understanding of the sport as well. But I don’t think you can describe everything about soccer in particular, and sport in general, solely by numbers. Moreover, I believe that those who claim so risk not just their own credibility, but also that of the wider sports analytics community. Like any good and honest researcher in the physical sciences, those in the sports analytics community need to be up front about the uncertainties involved with the metrics that they develop. Then the end users can decide if the results that are generated by these new measurements truly mean anything. It reminds me of the saying I learned in graduate school: “Everyone believes an experimentalist’s results except for the person who conducts the experiment, and no one believes a computationalist’s results except for the person who writes the simulation!”
The second argument that Phillips makes against soccer analytics is that all of the metrics come with their own set of assumptions that sporting managers have to discern. Again, this is not unique to sport analytics; all scientific, technical, and engineering research starts with a set of assumptions and attempts to proceed from there. Again, it is the duty of analysts to be up front about the assumptions that are used, but there will be assumptions. Soccer is a complex game, you can’t explain everything, but you can explain some things and you will have to start somewhere. It’s up to the sporting managers to determine if those underlying assumptions hold up to initial scrutiny. So yes, despite the use of soccer analytics, front-office people at professional clubs will still have to use their brains. Sorry.
The third argument Phillips makes about the proprietary nature of soccer analytics is a more serious one, and one whose implications I have wrestled with since starting this site (and company). Baseball sabermetricians had a much easier job when it came to developing advanced metrics in that game data were already available. It may not have been easy to find, but there were game details dating back from the 19th century and all it took was a community of people to compile those data. Soccer has a much more severe challenge in that even results data are incomplete, and results provide little information beyond who played, scored, got substituted, or ejected. Detailed in-match data were difficult to collect and often unreliable before the video era, and not many statistics were collected anyway (assists weren’t collected by FIFA until 1994, which happened to coincide with a World Cup in the USA). It is easier to collect in-match data for matches in Europe and North America, but obtaining those data is still difficult due to ownership issues, particularly in Europe. It would be possible to collect match result data in a collaborative open-source fashion, but in-match data collection will remain behind the firewall of the clubs and the league. Open-source soccer analytics have their place, but the scope is likely to be more limited than it is with baseball or other sports.
So Brian Phillips says that soccer analytics has limitations because of the sport’s inate complexity, the varying list of assumptions between proprietors, and the lack of an open-source ethos in the field. To the first two, I respond, “Yes, but I don’t have a problem with either one.” The problems are hard, and there will always be assumptions to be made, and both matters require researchers of high levels of integrity and transparency within reason. To the third, I must admit that I am still trying to come up with a coherent policy on that issue, but I do aim to help grow the soccer analytics community. If more people understand my work, more will be accepting of it. That’s the hope, anyway.
UPDATE: It’s a Big Soccer-lanche! Thanks!