Hello from Boston, where I’ve been covering the MIT Sloan Sports Analytics Conference. It’s time for the Soccer Analytics panel and I’m cautiously optimistic about its prospects.
Here’s how the session went down. These are paraphrased comments and they’re created during the moment, so 100% coverage and accuracy are not guaranteed. The panelists:
Andrew Weibe’s questions are in boldface.
Knutson: Can get data down to League One, challenge it to marry tactical data with tracking data
Almstadt: Issue is who is looking and analyzing the data — is it a “football person” or a “data/math person”? You really want to have both.
What’s next with data?
Stenz: looks like people are chasing the “big data unicorn” looking for the big answers — not possible at the moment
Merging of coaching and IT. Have to look at first level and understand quality of data. “Can play numbers game but need to make sure the numbers are right”
How to balance sample size from takeaways?
Smith: Good relationship between coach and data analysts matters. Comes down to clear understanding of what coach wants/expects. Data helps with filtering of metrics esp with international players, winnows field to scout effectively
Knutson: “When we were at Brentford, the scouts were surprised with the quality of players we were sending.” More of an augmentation to the way football is done rather than a replacement.
Smith: The quality of players referred increases credibility with coaching staffs
How do you balance universality of data sets with specificity requirements of internal data?
Almstadt: Arsenal bought StatDNA who have their own specialized data set, Arsenal has specific style of play that requires certain set of players (understanding the way own team plays is important)
Stenz: Definition of club data has to relate club philosophy — really comes down to definitions. All key performance stats collected in house. There’s hard definitions, like corner kicks, and “soft” definitions, such as duels (interesting because Opta has their own definition of duel). German football has its own understanding of duels
Knutson: What can we do ourselves that is easy and quick, and what can be do that can scale across leagues
(By the way, love Paddy’s socks!)
Smith: If we are expecting a stat that is equivalent to WAR in baseball, it’s not going to happen
Knutson: We don’t really need that. Not looking for huge wins, but small wins add up
What is cutting edge in ability to combine data sets?
Knutson: In clubs, you need tracking and event data, in matches and in training. Training data is much bigger than game data — can use them to build much better models. Takes seven years of game data to determine who is a good shooter. With training data can do it much more quickly.
Stenz: Need to remember that game data is so limited, and training data is comparing apples-to-oranges. Still need to do research on the data
Almstadt: Ghosting (which was presented in Research Paper competition) is fascinating, but still a lot of advancement from relatively simple analysis (set pieces, shot location)
Knutson: Lots of improvements available from execution of little things. So much is left of table.
Smith: Marginal improvements matter, commitment from stakeholders/owners matter
Knutson: Have to emphasize that we (analysts) don’t have all the answers and are still learning. Seems like most of our work is aimed at next generation of coaches
Stenz: Challenge to breed such a culture and get buy-in from coaches, players
Actionable gains in clubs?
Smith: Change in how players are identified — need to understand what coach wants. What is the coach’s philosophy, what is the game model? Use those to rank and filter players
Almstadt: Player recruitment, identification, and scouting. Gave example of using stat analysis to challenge scouting assessment of a player, which ultimately expanded the pool of players considered.
Knutson: Football is a very subjective, emotional business, so being able to talk about data is very important.
Where are there opportunities to gain on the field?
Knutson: Set pieces. Best for clubs to devise their own strategy/style, and focus on exploiting that as much as possible. Analysis of defense and adjustment to opposition are hugely exploitable opportunities that can provide an edge. Can only assess this reliably in training. Basic stuff, but have to know that (1) opportunities exist, and (2) can execute on them.
What numbers do coaches want to see?
Stenz: Don’t have much time to make data adjustments at halftime, but need to provide numbers that indicate competitive advantage.
Almstadt: Coaches want to see two or three major metrics that convey competitive advantage. Takes time to understand what coach wants — usually takes three years, and PL managers last 18 months (median).
Smith: Need to keep it simple in communication with players and managers
What research is being done in likelihood of scoring shot?
Knutson: Coolest research right now is in expected passing. Over 1000 passes in a match, can build a robust model. Passing ability models (not sure I’ve written that right). Work in process, but moving to understanding football as “a passing game with shots at the end”
Stenz: Waiting from RFID data, ability to combine training data. Injury prediction models is next big thing.
Knutson: What about comparing young players to players they aspire to — understanding how model players played and knowing where current players are. No one is doing that now.
How are clubs adjusting to the modern ownership era?
Stenz: Whitecaps generating large amounts of money, need to be committed to generating in-house knowledge independent of manager
Smith: Use of data by ownership group is much more natural in North American sport than European sport
Knutson: Relays story from club owner who said that he’d love to see a club be run entirely by data
Almstadt: All data went to Wenger, who is primary decision-maker
Was Leicester City a statistical outlier or a betting one?
Knutson: Yes it’s a bit of a fluke, and this year’s performance is actually underperforming club fundamentals. Thirty-eight matches is a very small sample size, not always recognized by the public.
Almstadt: Underlying performance by a club is key to understand, prevents overreacting.
Almstadt: Biggest asset of StatDNA is that there is a professional analytics service in-house and can think long-term.
How do outside resources help?
Smith: Can use expertise from KSC (Kroenke’s ownership group), and helps to have someone from outside soccer look at data. Long-term commitment is vital. Challenge is to translate performance in one league to another, develop expected performance.
Knutson: More leagues you have access, cheaper players you can find. Need to be open to outliers
Stenz: Comes down to long-term commitment by club — hopes that the days of 100% staff turnover when manager/sporting director join club are over. Examples of Borussia Dortmund who continue to find talent after departure of major players and managers. (Sevilla is another example)
Knutson: Institutional knowledge is extremely important, and rebuilding is expensive. Continuity of style saves money (gets agreement from Smith)
Stenz: We still haven’t identified the role of the analyst in the club. Most analyst work is long-term based, but most report to the manager. What happens when the numbers don’t match with the expectations of the manager? Not good. “Clubs spend millions on managers, hundreds of thousands for assistants, and tens of thousands for analysts.”
Knutson: Football clubs are hypercompetitive on managers and players, but not on analysts. That will change.
And that’s it!