Moneyball and soccer: four years later

When I first started the Soccermetrics blog, I wrote a long post on whether the concept of “Moneyball” could be applied to soccer.  Two years after that, I wrote a follow-up.  Now that we’re at the four-year mark of Soccermetrics, it’s time to revisit the subject.

In my initial post at Soccermetrics, I posed the question, “Can a ‘Moneyball’ approach be successful in soccer?”  This was my answer at the time:

While I believe that there are some very intriguing questions about the applications of statistics to soccer, in general I think the answer to the original question is no.

Two years later, I revised my answer:

There are some fascinating problems in the sport, and after two years of running this website I see some of the possible applications, including (but not limited to) talent selection, mid-season adjustments, pre-match preparation, and I also see some of the roadblocks, such as the lack of data models suitable for analytics work.

So I would have to amend my earlier “No” to “Not sure”.

In the two years since that last post, analytics has expanded across industries and exploded into the public consciousness.  It seems that if you read or listen to the broader media, you will find a story on how everything is Moneyball unless it’s Big Data, whether it’s manufacturing, finance, marketing, or even politics.

Michael Lewis’ book jumped from the printed page to the silver screen, winning a number of Oscar nominations in the process and making it easier for everyone running a data analytics company to explain it to their parents, friends, or spouses.  I’ve used the term “Moneyball for soccer” to explain my company to prospective investors and laypeople, and it’s very likely that I’ll continue to do so for the near future.

That said, I’ve voiced this opinion before but not on this blog, and it’s time for me to say it: I hate the term “Moneyball”.

Yes, I am aware that “Moneyball” is a catch-all term for data analytics among the general public. Yet as Ben Alamar states in his excellent post, Moneyball strategies are a subset of analytics but analytics are not equivalent to them. To the extent that we in the soccer analytics community make analytics synonymous with Moneyball, such a juxtaposition limits our thinking about their applications and undercuts our relevance to the soccer industry.

(“Big Data” isn’t a great catch-all term for the soccer industry, either.  It conjures ideas of crunching terabytes of data at a time, when in reality the amount of data that sports teams and other parts of the industry encounter is much smaller.  Yet it’s still overwhelming, which leads to Bruno Aziza’s proposal that “Big Data” be redefined as the situation where a company’s data infrastructure can’t meet its needs.)

For example, there are undervalued players in every domestic league around the world, as there have been since the start of the modern player transfer system.  As more data become available on professional players, it’s no surprise that some clubs will incorporate those data into their decision-making. Some players have been purchased on the strength of their match statistics with great success (Shinji Kagawa by Borussia Dortmund), and others have been sold due to declining statistics that didn’t indicate a player’s true performance (Jaap Stam by Manchester United).

In both decisions, certain in-match statistics such as passes, shots, tackles and clearances are used to evaluate a player’s performance level, but which statistics are indicative of a player’s performance?  More importantly, are they better indicators than random?  Aside from the obvious statistics — Lionel Messi’s goals, for example —  is it realistic to expect in-match statistics to communicate useful information about a player in a game dependent on time, space, and player positions?

Within the last two years there have been attempts to extract higher-level information about player and team performance through network systems analysis.  From Sarah Rudd’s application of Markov chains to technical/tactical data to Javier López Peña and Hugo Touchette’s extension of Luís Amaral’s network analysis of soccer matches, researchers are beginning to view individual match events as components within a networked system in order to exploit contextual and spatial information.  While these methods have promise, one challenge that these methods face is to develop measures of player effectiveness or impact that are meaningful to an end-user.  A second challenge is to enable a dynamic analysis in order to assess changes in player and team performance over a match.  A third challenge is to establish whether such results are statistically significant.

Over the next two years, player health performance tracking through GPS technology will assume greater importance.  A Norwegian company named ZXY has developed load cells and heart rate monitors to track players in domestic matches there, and adidas’ miCoach is set to be used in Major League Soccer matches starting in the spring.  In-match tracking depends on FIFA approval of GPS units in international matches, whose current reluctance has been viewed less as a technology issue than a safety issue.  Perhaps the experience of the International Rugby Board and the Australian Football League will encourage a national FA to make a proposal to FIFA to change their law.  It is the one technology area that FIFA is most likely to approve after the decision on goal-line technology late last year.  GPS-based player tracking already exists on the training pitches of major clubs around the world, but in-match health monitoring would for the first time create a dataset of in-match health data that overlays technical/tactical data, enabling connections to be made between the sports scientists and the data analysts.

At the edges of the playing field, analytics are being created to understand the performance of managers and match referees.  Tom Markham has received much attention for creating a statistical model for assessing the performance of football club managers throughout the season. Erik van den Berg performed some interesting work on the inefficiencies present in transfer systems in European football.  Without divulging names, I am aware of groups who are collecting finely-grained data of referee performance in domestic league matches.  These analytics are important because they comprise measures that the economic buyer would use directly.

Everything in analytics comes down to data.  I said two years ago that the lack of available data in a format suitable for analysis limits the ability of the soccer analytics community to advance the state-of-the-art.  The Performance Analysis department at Manchester City FC sought to address this issue by starting the MCFC Analytics project, which made summary statistical data on the 2011-12 English Premier League available to armchair analysts around the world.  I had my own initial thoughts on the dataset, which I’ve since revised upon a better understanding of MCFC and Opta’s intentions, but it does lower the barrier to entry for amateur analysts or curious fans, which ultimately serves the interests of both parties. At the other end of the spectrum, soccer data companies now have close to 15 years of data on matches and players, effectively a generation and a half of players whose performances have been tracked quantitatively.  We’re now starting to see research into player career history and tracking — the Prozone presentation on player archaeology at Leaders in Performance is one example — but such work remains well behind that of baseball and other sports.

I hope I’ve been able to demonstrate that football analytics goes much deeper and broader than “Moneyball”.  More than the quest to find that one metric that identifies over- or under-valued players, analytics are the use of the statistical analysis of data to solve relevant and compelling questions in and surrounding the game, from in-match performance to front-office management to media storytelling and infotainment.  So when I reconsider the question, “Can a Moneyball approach be successful in soccer?”, I now respond, “You’re asking the wrong question!”

The better question is, “Can data analytics become a valuable and indispensable tool in soccer?” I don’t know the answer to that question for sure, but we’re developing tools and applications to enable analytics across the entire industry and community.  If you want to be part of that, join us!