This is the first of a three-part preview of the 2012 MIT Sloan Sports Analytics Conference. In this post I will take a look at the ten finalists of the Research Paper Competition.
The Sloan Sports Analytics Conference started a Research Paper session in 2010, coincidentally, my first year at the conference. It’s been a very popular addition from the start and has since grown into a more sophisticated and formalized process. First, researchers have to submit an abstract of their work, and a selection of those abstracts are invited to submit full papers. The finalists present at the Conference, and the winning presentation and paper receives US$7,500 (second place receives $2,500).
The first thing that strikes me as I look at the final paper list is how basketball-heavy the group is. Of the ten papers, seven are basketball-related, and of those seven all but one focus on the NBA. I think there are several reasons for this.
First, the SSAC, at its core, is a baseball and basketball analytics conference and most of the panelists and attendees have come from those two sports. (They don’t make up a majority any more, but they’re still a significant block at the conference.)
Second, basketball data is rich enough that a motivated independent researcher — independent meaning “not employed by a NBA team” — can produce some good quality research. In that respect it shares the same advantages as baseball in that its historical data allows for sophisticated analysis; there was more sophisticated post-processing that needed to be done such as play-by-play, but the large body of historical data allowed the basketball community to release play-by-play statistical data for low or no cost.
I’m sure that one counter-argument is that there are plenty of soccer analytics researchers doing interesting and compelling work, and the papers and posters presented at the recent NESSIS indicate that. But those researchers were working from an in-match dataset supplied by StatDNA. The type of research one can do with historical data remains limited relative to basketball and other sports. (It’s still quite good, don’t misunderstand me, but it’s a different kind of work from what’s being shown at the SSAC.)
Now, let’s talk about the research paper finalists. I think if there’s one consolation for the heavy basketball content, it is that basketball analytics hold a lot of parallels with soccer analytics and advances in one field have applications to the other. In general that’s true of all of the invasion team sports.
One paper that I am looking forward to listen to is one that incorporates an expected goals model in an advanced plus/minus rating for the National Hockey League. This paper is a continuation of work that the author had presented as a poster at the SSAC last year (he’s an Assistant Professor at USMA-West Point). There’s been quite a bit of discussion on advanced plus/minus ratings in soccer, and one of the complications of employing that metric is the high level of statistical noise in the result. Prof. MacDonald uses multiple predictor variables to create expected goals and combines that result with ridge regression to obtain a smoother plus/minus metric. (The mean-squared error goes down, but in his table of the top ten players there is no standard deviation attached to the plus/minus ratings.)
Another variation on the advanced plus/minus metric is the “Skills Plus Minus” presented by a group of three quantitative finance analysts. The objective of this research is to measure team chemistry in the NBA by incorporating the player’s offensive and defensive skills (quantified by match statistics) within the multiple regression structure of the advanced plus/minus metric. The authors claim that this framework allows one to identify which group of players work well together, and which free agents are best fits for particular teams. It’s very sophisticated research (would you expect anything different from a group of quants?) and something that a lot of NBA executives would be interested in.
Two papers at the Conference present results from new data measurement technologies. The most interesting paper is Kirk Goldsberry’s on Sportvision, which is an analytics platform that incorporates spatial information to create new metrics of offensive performance in a basketball game. Goldsberry is a postdoctoral researcher in Geography at Harvard, and it appears that that skill set is really valuable in looking at basketball — or any other invasion sport — in a different way. Another paper by a group of University of Southern California researchers studies rebounding behaviors using STATS optical tracking data. It’s interesting work, but a research team could do the same thing in soccer with Prozone or Tracab data.
Finally, the paper that uses cumulative win probability to predict performance in the NCAA basketball tournament could have implications for predictive markets. I like looking at such papers to see how other researchers in other sports are approaching the issue, while recognizing that there are some unique characteristics of soccer competitions.
So in conclusion, just because none of the papers in the Research Paper track focus on soccer doesn’t mean that there aren’t cross-pollination opportunities. The big issues for those of us in the soccer analytics community are data richness and data access — it’s usually one or the other, but rarely both.
(CORRECTION: The Research Paper Competition started in 2010, not 2009. I changed the text.)