The ‘lite’ dataset released by the MCFC Analytics project last weekend is a massive dump of almost 200 summary in-match metrics for every player who participated in every match of the 2011-12 English Premier League — more than 10,000 records and 2 million data cells. We set out to convert that dataset into something more structured and amenable to our analytics software. The result is the FMRD-Summary database.
The FMRD-Summary database schema models data that are less granular than the Football Match Event Database, but more detailed than the Football Match Result Database. It captures the much of the same type of data that you would capture with the Football Match Event Database, with the exception of the Venue and Venue History tables. The main feature is that it captures in-match statistical summary data down to very fine levels of detail, in particular field position, body part, and type of open or set play. The schema is built for those who track summary statistics in football matches as well as historical match data.
The current schema only models data from league competitions, but extensions to capture data from knockout and/or hybrid competitions can be added if enough people request it. We have added a Seasons table to simplify the Competitions table a bit (that change will eventually make it to the other FMRD schemas).
Besides the tables that are inherited from the Football Match Result Database schema, there are 33 tables that capture in-match statistics. Yes, it does sound like a lot and it does get quite involved, but it’s easier to deal with than 200+ columns on a sheet. We group these tables further into nine categories:
The schema exists — coded in SQLite — and lives on the Soccermetrics fmrd-summary repository at GitHub.
Over the weekend I’ll describe the database tables and their fields in a series of three posts (three categories each). I’ll set up a wiki on the repository and copy my descriptions over there.
Schemas are nice to have, but what we really want is a database created with these rules. So I will present some code that starts the process of loading the database with the MCFC Analytics data. I do want this to be an open source project, so contributions are very much welcome.