Hello, and greetings from 34,000 feet! I decided to get started on my review of the Sloan Sports Analytics Conference. This is going to be the first of about three or four parts that touch on different sessions during the event. I took pictures on Saturday but the uploading software that I use is on my other computer; I'll link those pictures into my post when I return home.
(Apologies for some of the blurry images; I didn't want to use flash indoors but couldn't keep the camera still in my hand during the long shutter times.)
Saturday was the MIT Sloan Sports Analytics Conference, or "Dorkapalooza", as ESPN's Bill Simmons calls it. In previous years it was held on campus at the Sloan Business School at MIT, but this year the conference has grown so much that it was moved to the Massachusetts Convention and Exhibition Center near Boston Harbor. The conference has mirrored the expansion of analytics research beyond baseball – last year there were 300 participants and this year there were over 1000, with a 400-person waiting list. There is quite a mix at this conference — graduate students in either business, engineering, or math, professionals in finance or other technical fields, officials from sports teams (all of the Big Four sports in the USA, plus golf, soccer, and rugby), ESPN personalities, and other members of the press.
With the move to a much larger venue, I wondered why there had to be a waiting list, and several people who I talked to at the conference wondered the same thing. Most of the answer is that the event is organized entirely by students within MIT Sloan, with some assistance from the Institute administration and Daryl Morey, GM of the NBA's Houston Rockets (and a Sloan MBA alumnus). MBA students are always going to be busy, and putting on a 100-person conference — let alone a 1000-person one! — is a huge logistical and organizational task. But the students did a very good job and were able to secure some high profile sponsors, like ESPN, EA Sports, and several consulting and financial firms. I've participated in many conferences conducted by professional societies with larger organizational structures, and the SSAC compares very well with them.
In contrast to other conferences that I've attended, in which there are some keynote addresses (plenary sessions), a few panel discussions, and concurrent conference paper sessions, the SSAC consisted almost entirely of concurrent panel discussions and just one conference paper session — the four papers in the conference paper contest and four additional invited papers. The benefit of such an approach was that it focused the conference on a high-level survey of the current state of sports analytics from the perspective of creators and users. There was talk about specifics, especially in the baseball and basketball analytics sessions, but not in the minutiae of formulas and distributions.
I spent most of my time in three panel sessions. The first one was the Emerging Analytics session, which featured Michael Forde, Director of Football Operations at Chelsea, Simon Wilson, Director of Performance Analysis at Manchester City, Paraag Marathe, Executive VP of Football/Business Operations of the San Francisco 49ers, and Aaron Schatz, editor of the website FootballOutsiders.com. That session was standing room only — I had to stand along the walls of the conference room — with overflow seating and flat-panel monitors outside the room. The session was led by Kate Fagan, a reporter with the Philadelphia Inquirer, who asked questions of the panel during the 80-minute session and fielded about four or five questions from the audience. She asked good questions for the most part, but she wasn't extremely knowledgable about soccer. A couple of times there were some follow-up questions that I would have liked to have seen posed to Wilson and Forde but weren't made.
All of the panelists were asked about the current use of their analytics in their sport. Michael Forde, in his rich northern English accent, noted what I felt was a significant statistic: not only are 60% of the players in the English Premier League outside the UK, 60% of the ownership and coaching staffs are outside the UK. That represents a multitude of cultures and perspectives in the English domestic game. For Forde, the main uses of analytics are in the economic aspect of the game, such as player contracts, transfer markets, and risk management. With the billions of dollars flowing in and through the Premier League, the overriding concern is ensuring that clubs are extracting the best possible performance from their talent. Simon Wilson felt that there was more acceptance of analytics in the last five years in the English game, but felt that there was still a long way to go. Marathe also concurred with the use of analytics for risk management and player evaluation, but also added that there is a need for end-game analytics in American football, which isn't quite the same as in soccer because of the different natures of the two games. Aaron Schatz felt that analytics is indeed making inroads in the NFL, but the sports media is still quite ignorant of them, and he gave the fourth-down call in the Patriots-Colts game as an example. I would argue that the sports media, and the broader media in general, has a very poor understanding of statistics and mathematics as a whole, and they tend to be more clueless about analytics as a result.
A lot of the discussion in the panel centered on two events: the Patriots-Colts game in the recent NFL season where New England missed a fourth-down play deep in their own territory late in the game, and the world-record transfer for Cristiano Ronaldo from Manchester United to Real Madrid. The second issue was posed to Forde and Wilson, and Forde used it to illustrate the difficulty of coming up with rational approaches to player valuation. (Rational analysis of irrational processes – common in finance and sport.) It's very difficult to use analytics to arrive at a market rate when there are club presidents who are willing to pay wildly inflated fees to bring a featured player to their team. In the case of Real Madrid there are on-field and off-field revenues to consider, but with the retention of image rights by the player it's difficult to imagine Real ever coming out ahead from their transfer deal. Wilson said that even in such an environment it's possible for clubs that are prudent and rational to be successful by making smart choices in the transfer market. It's kind of ironic that Wilson said that considering how moneyed his ownership is at Manchester City, but Forde did show that such an approach can work through his tenure at Bolton.
The biggest hindrance to the development of effective analytics is the lack of suitable data. All of the panelists commented extensively on this issue, and it was a theme that I heard multiple times during the conference. Forde commented that there aren't sufficient data on the physical aspect of football — all forms, actually, but especially in soccer — where an outfield player is expected to run 15-20 km in a match and perform technical skills at any point of the run. In that respect, performance analysis encompasses more than just passing completion rate or efficiency or even a plus/minus rating, but also the influence of physiological and psychological factors as well (Forde's psychology background helps in recognizing the latter one). Video data are especially limited and their generation is complicated by the nature of the two sports – American football being very system and personnel dependent, association football allowing for free movement of players (less so goalkeepers) and flexibility of formations and tactics. (There is flexibility of tactics in American football, but you'll rarely see defensive formations shift from a 3-4 to a 4-3 during a game, to give one example.)
Of course, data have to be not just suitable but also useful. This matter was brought to the forefront by the American football panelists and supported by the soccer panelists. Marathe said that it is difficult to measure how effective a player really is because of "covariance" — some statistics may look either very impressive or unimpressive because of the surrounding players, which often lead to poor personnel selections. He went on to say that perhaps two-thirds of the free-agent signings in the NFL are not fruitful; quarterback and running back signings tend to be more successful signings than other positions. The incorporation of new players is also very difficult because the data aren't very good, highly coupled, or simply nonexistent. The latter issue is more relevant in soccer where players are often recruited from South America or Africa to the top-flight European leagues. Of course, data are most useful when they are unbiased, and they're even better when they're used. Marathe mentioned that the NFL combine is one example where football players are evaluated through what's essentially an athletics meet, and then the data are largely forgotten about at draft time. He went on to say that the first round of the NFL draft is often a beauty contest — the team management will fall in love with a player and then collect as much data as possible to support the decision that had already been made. (Aaron Schatz laughed out loud when he heard, as if something he always felt was true was being confirmed by an NFL team official.)
There are some factors that makes the development of performance metrics in soccer complicated. There is the intersection of player and team cultures in the sport, and also the competition from other leagues. Analysts in the NFL have to figure out how statistics from college football translate into the professional game, but soccer has to consider that for multiple leagues around the world as well as the lower divisions. (I did find it interesting that Forde listed his competition as league sides in Italy, Spain, and France, but didn't mention Germany.) One important thing to remember is that the players and coaches are the central members of the soccer organization, and the development and use of analytics should be focused on allowing those members to perform their jobs as best as possible.
After the panel session I was able to chat briefly with Wilson, Forde, and Jeff Agoos of the New York Red Bulls about the state of soccer analytics and what they felt were the big issues. The one thing that I heard from everyone was data — one can never have too much data, and right now soccer has very little of them in certain situations. At the same time, one wants to be sure that all the gold has been mined from the available set of data, and it's not clear that that has happened. Another thing I heard was that the analytics as a whole aren't very mature; there are some approaches that are being taken at some soccer clubs in the USA and England, but there is a long way to go to catch up to the impact analytics have had on baseball and basketball.
So in conclusion, there are problems and opportunities in the area of soccer analytics. The Emerging Analytics session was very useful for learning what practioners in the field believe are the main issues and for finding out what researchers in other sports are doing. And of course, the networking opportunities are extremely valuable. To all those I met at the conference, it was a pleasure meeting you and I look forward to remaining in contact. This is a very exciting time in the field, and I'm excited to be a part of it.