This weekend I traveled to the San Francisco Bay Area to participate in the Northern California Symposium on Statistics and Operations Research in Sports (NCSSORS) at Menlo College in Menlo Park. This symposium is the West Coast counterpart to the New England Symposium on Statistics in Sports (NESSIS), which takes place every odd year at Harvard University.
Both conferences are compliments to MIT's Sloan Sports Analytics Conference. While SSAC presents high-level discussions on the state of sports analytics in the broadest terms possible, before increasingly large attendances, NCSSORS and NESSIS are more intimate events where statistical researchers and quantitative analysts get together and present and discuss topics in either statistical modeling and analysis on a variety of sports. Despite the differences in size (SSAC attendance this year was 1200, NCSSORS attendance about 70), both conferences attract leading experts in their fields.
There were two types of presentations at NCSSORS. The first were oral presentations (eight) which were held throughout the day on statistical modeling and analysis topics in sport. The second were the poster presentations (about 20), which were displayed on easels throughout the day with a dedicated session during lunchtime. As you all know, yours truly had a poster on the soccer Pythagorean:
The colors aren't exactly what I wanted to use (I would have preferred blue and white to be consistent with the company logo), but after finishing the poster at 2:30am I wasn't about to go fiddling with the color settings. The poster was easy to spot, which is always a good thing at meetings like these.
The conference organizer made a good effort to have oral presentations that covered a variety of sports, such as Australian rules football, cricket, baseball, basketball, golf, soccer, and American football. The paper on Aussie rules was interesting to me as a fan of the AFL and as one curious to learn how home advantage manifests itself during a match. However the findings would be of principal use to the betting houses, which is not an issue here. The second paper that I listened to was on optimal promotion/relegation in sports leagues, and the authors were focusing on applying the P/R system to professional golf (which has one between PGA and Nationwide Tours) and the NBA. It was very clear listening to the presenters that they had no familiarity with the concept of P/R until very recently when the principal author came across a description of its use in the English soccer leagues. They sought to determine the optimum number of teams to promote or relegate to make a more competitive league, which is fine, but they applied it to the NBA (set up three divisions) and tested their approach by having teams in the three divisions play each other, which defeats the purpose of P/R! I felt the authors were missing one additional element of promotion/relegation, which is to improve competitiveness and allow elite teams to play at a level commensurate with their ability. A study on the optimum number of teams to promote and relegate to achieve both outcomes would have been great, but this paper did not do that, which was frustrating ultimately.
In the afternoon, I listened to a paper on ranking the top goalscorers in the major European soccer leagues (England, Germany, Spain, Italy, France). I'm going to have to save my irritation at the professor for another post, particularly when it comes to the whole "soccer vs. football" argument. I actually think that the idea of looking only at the elite goalscorers in these leagues is a good one, as is the attempt to map the goalscoring statistics to a standard normal distribution Z(0,1) — then use z-scores and percentiles to rank goalscorers. There were some interesting results, as the top-ranked goalscorer under this scenario was Mamadou Niang (ex-Marseille, now-Fenerbahce). This particular type of analysis has implications for valuing goalscorers, and it would be interesting to see how these percentile ratings evolve with time for each player.
The most useful presentations, however, were the two keynote talks by the performance analysts with the professional sports teams — Sig Mejdal with the St. Louis Cardinals, and Roland Beech of the Dallas Mavericks. Sig's background was of special interest to me as he also worked at NASA for a time (I was never able to ask him which Center he worked), and he discussed the type of data they have at their disposal and the types of analysis that he and his team use. Analysis ranges from multiple linear regression to more sophisticated techniques, but he warned of decreasing marginal utility with these more complicated algorithms; there is definitely an advantage to keeping it simple for researchers and decision-makers alike. At the same time, critical decisions are being made on very limited data, particularly when it comes to drafting college and especially high school athletes. So there is a need to use both scouting and data analysis in draft decisions. There is always the fear that one will make a selection that turns out to be a dud (the Type I errors), or miss a great talent due to a statistic or physical feature that masks his ability (the Type II errors).
Roland's work in the sports business is different from Sig's in that as a "stats coach", he is on the bench with the team for every NBA game. His data collection during the game is limited by NBA rules (no electronic aids allowed in the bench area), and he has very little time to interact with the head coach, so he must pay close attention to off-the-ball actions and nuances in play, present his main points clearly, and decide before the game on what he is going to present. His advice to quantitative analysts who would like to make the jump to a professional team is three-fold: develop original and creative work, speak to the language of sports decision makers, and make sure that the work is compelling and credible. Credibility is the most critical element, in my opinion; statistical analysis will always be held under suspicion, and it is very important that it be presented clearly and honestly without any hint of overselling.
I was hesitant before the conference as to whether I would find it useful; I was one of just two soccer-related people there and the other person wasn't really there to interact with the other researchers. But I enjoyed the posters and the conversation with the analysts in other fields. The established analysts like Sig and Dean Oliver were all very encouraging of my efforts and gave me some very valuable perspective on their work. It was especially good to meet Dean Oliver, whose work on basketball analytics has been of consistently high quality. I think we're both recognizing the parallels in the modeling and analysis challenges of soccer to those of other sports, so any advances made in this sport will be closely watched in others.
Once again, thanks to Ben Alamar for putting on a productive conference and selecting a good mix of speakers and posters. I hope to see most of them in Boston next year!