Moneyball and soccer: two years later

Two years ago I started Soccermetrics in order to write about statistical problems in soccer in a more technical and uninhibited way. My initial post was one that I had written on my HexagonalBlog website as a way to answer some questions in my mind about the viability of an analytical approach to soccer.  Now that this blog has reached its two-year anniversary, perhaps it’s time to revisit the issues that I raised there.

I am sure that many people will find it odd that, despite having launched a website, started a (very fledgling) business, and developed a number of metrics and tools that have found support in the analytics community, I am ultimately skeptical about the viability of soccer analytics.  I don’t believe that that skepticism is a bad thing; on the contrary, I believe that it is useful to be clear-minded about the problems and shortcomings in order to formulate a strategy for solving or overcoming them.

A little over two years ago, I posed the question, “Can a ‘Moneyball’ approach be successful in soccer?”  This was my answer at the time:

“While I believe that there are some very intriguing questions about the applications of statistics to soccer, in general I think the answer to the original question is no.”

I answered in the negative for two reasons: one, the complexity of the sport that makes the development of tractable high-fidelity models difficult, and two, the problem of incomplete or nonexistent data in part of the world where demand for players is highest.  At the time I forgot to add a third reason, which is the skepticism, if not outright hostility, of soccer people toward an analytical view of the game.  From my experience this hostility has been strongest amongst the members of the press and (some) supporters.  It’s not necessarily without merit in my view, but it does present problems for decisionmakers if analytically-based selections go wrong.

It is very true that soccer is a sport that does not admit itself easily to statistical analysis.  Basketball, ice hockey, and American and Australian football have the same challenges as soccer in terms of determining which players are most influential to a team’s success, which indicates that the commonly generated statistics do not tell the entire story.  However, those four sports have a large set of accepted statistical measurements for their players; soccer’s set of statistics are very limited and assessments to be made are qualitative in nature.  When people involved in soccer, whether coaches, fans, or writers, say that statistics are misleading and one can’t trust them in soccer, I agree with them.  But my response is not to discard quantitative analysis in soccer altogether.  My response is to develop better measurements.

In order to develop better measurements, I think it is important to start any kind of analysis from first principles: establish assumptions about a particular problem that is logical and defensible, express those assumptions in the language of mathematics, and then work from basic mathematical and statistical principles to develop solutions.  That is, if you wish to be exact.  Close enough is just as good most of the time, and I love reading posts that present back-of-the-envelope calculations to develop some interesting metrics for soccer. Of course, it would be great for credibility and sanity reasons for those “exact” and “inexact” solutions to be very close.

Actually, I think it’s essential for credibility reasons that those two solutions are close, and that the solutions pass the smell test.  I also believe that it is important that developers of analytics be clear-eyed about the limitations of their methods, which gets into the idea of not using metrics that one doesn’t understand.  While the point of these analytical tools is to find players and actions that are underrated or overrated by the conventional wisdom, it is still reassuring for the decisionmakers to confirm some of their expectations.  I am not sure that the current collection of soccer analytics is suitably mature for use in the front office.  There’s not much of a “collection” of tools, at least as far as I can tell, and I think there is still some debate as to which tools or measurements might actually be useful to a front-office person (or a member of the coaching staff).  In the course of developing analytics tools, we all need to answer the questions, “Is the problem relevant to the end user?” and “Is the problem compelling to the end user?”

And what type of problems or issues might be relevant or compelling to the end user?  I would classify them into three categories: pre-match preparation, pre-season squad selection (in club football), and in-season adjustments.  I guess I shouldn’t be surprised by the German preparations for the World Cup, but I was impressed that they were so systematic and comprehensive in their analysis of player actions with and without the ball, as well as interactions between other players.  Such analysis is very useful for generating match strategy and tactics, but the challenge is such analysis is to convert findings into actions, and preferably no more than three actions for players.  In-season adjustment between matches presents the best opportunity for the use of soccer analytics, and managers such as Arsène Wenger are known for using a data-based approach for assessing player performance and determining the best time to make tactical decisions.  (I’m much less optimistic about the use of analytics for in-match adjustments; there are simply too many variables to consider in the heat of a match, as Bill Polian pointed out in the Limits of Moneyball session at last year’s MIT SSAC.  Pre-season squad selection appears to be another area of great interest to players, but there is much more public-relations risk involved with making selections in this manner.  The risk of committing Type I errors (selecting “can’t miss” talent that turned out to be a dud) and Type II errors (taking a pass on available players who became stars somewhere else) is not all that different between managers who have “eyes for talent” and those who choose to use a more analytical approach, but in my opinion the latter type of decisionmaker will receive little leeway from the press or from supporters.  What people observing the sports industry fail to appreciate is how little data are being used to make critical decisions, especially when it comes to drafting young talent.

The lack of data in an appropriate format for analysis is something that I believe limits the expansion of soccer analytics.  This is a different challenge from baseball, which has a large statistical dataset over the last 100 years.  Much of the data is either in disparate places on the Web, incomplete, unreliable, or behind closed doors.  In-match data are especially suspect, as most leagues outside the developed world simply don’t tabulate them. Europe has the whole issue of database rights which means that the owner of the data can prohibit others from copying them provided that they come from a database.  (It’s one reason why when I started work on my soccer database design, I did not model in-match data.)  The lack of suitable data is a problem because it makes advanced statistical analysis in the sport more difficult.  When we talk about potential inefficiencies in the transfer market, and whether certain indicators of future talent exist, it’s difficult to know for sure because we don’t have the data to make those kinds of studies.  We’d especially like to be able to do studies from elite junior level to the pros, but the data would only exist in developed countries, and even there they would be incomplete and inconsistently collected.

So can a “Moneyball” approach be successful in soccer? There are some fascinating problems in the sport, and after two years of running this website I see some of the possible applications, including (but not limited to) talent selection, mid-season adjustments, pre-match preparation, and I also see some of the roadblocks, such as the lack of data models suitable for analytics work.  So I would have to amend my earlier “No” to “Not sure”.  There will be more openness to the approach in the soccer community, but a lot of people are still trying to decide what to make of it — including the soccer analysts themselves.