Today, I announce the release of the Marcotti-Events data schema on GitHub. Marcotti-Events (formerly known as Football Match Events Database) extends the Marcotti data schema to capture the micro events — touch-by-touch or technical/tactical data in soccer analytics parlance — that make up a football match. There are still some tweaks to be made as bugs reveal themselves during testing, but the schema is close enough to permit a release.
Several years ago — around the time that the MCFC Analytics project was being launched — I announced the creation of the Football Match Events Database schema. The plan was to make it publicly available as the touch-by-touch data sets became available, but in the end those plans were shelved as the MCFC Analytics project changed. I did continue to refine the schema and I used it to build databases for the Soccermetrics Connect API during the 2014 FIFA World Cup.
Initially, FMED/Marcotti-Events was kept private because of its dependence on the proprietary nature of the highly granular data feeds created by the sports data companies. The data feeds remain proprietary, but the collection processes are similar between data companies, more is known about these feeds than before, and more people in the football research community are using them than before. Moreover, football analytics has matured to the point that more researchers are willing to share tools used to model and analyze data, and this release reflects that attitude.
Marcotti-Events, like the other Marcotti data schemas, is a collection of data models written in SQLAlchemy. This allows us to exploit Python’s object-oriented functionality and add some helper expressions. As with the other schemas there is a common collection of models along with club- and national-team-specific models that allow us to build separate databases for clubs and national teams. The models that capture high-level and personnel data are copied from Marcotti: Competitions, Persons, Venues, Matches, and so on. What makes Marcotti-Events different is that every event is a micro event and is captured in the following pattern:
- Events: All events are parameterized by time and, if applicable, the field coordinates at which the event occurs and the team to which the event is relevant.
- Actions: All events contain at least one action, which could be touch events such as passes, set-pieces, fouls, and goals, or non-touch events such as start/end periods, offsides, or substitutions.
- Modifiers: All actions can be described further using modifiers, which have specific categories such as shot type, shot outcome or foul type. Some actions, such as offside or start/end period, do not have modifiers.
So once again, the usual caveats about this project:
- The Marcotti data schemas are used to BUILD databases. They are NOT databases themselves!
- Use of these models requires knowledge of Python.
- If you feel that there are other values not listed in the data models that should be tracked, fork the repository and customize it yourself. You can even send a pull request, but expect requests to add betting-related fields (e.g. odds data) to be turned down. If you do wish to extend the schema in order to track betting data, let us know and we will post a link.
- There will be a Wiki that will describe use cases in the near future.
As I said in a previous post, this is part of a larger process to share more of the tools that Soccermetrics uses for its analysis, from data ingestion to data analysis and visualization. Your interest and contributions are appreciated and highly valued.
Once again, follow the link to find the Marcotti-Events data schema.