The modernization of the Soccermetrics data models continues. Early this morning I made a soft release of the Marcotti repository on GitHub, and I’d like to write a few words about that now.
Marcotti is a snazzier name for what used to be called FMRD — the Football Match Result Database. This is the original data schema that I created to capture and track the major events associated with a football match, such as the competition, the competing teams, match personnel, lineups, goals, bookable offenses, and substitutions. You can learn more about the design philosophy in this post from 2011.
As with Marcotti-Light, the SQL database tables are rewritten as Python classes using the SQLAlchemy software library, which gives us access to a much larger and fully-tested toolbox of database routines and permits us to exploit Python’s object-oriented qualities. The current design allows one to build databases for club matches or national team matches, with common data models that are used by both types of databases. Marcotti also comes with a much larger test suite.
The usual caveats apply:
- The other Marcotti data schemas are used to BUILD databases. They are NOT databases themselves!
- Use of these models requires knowledge of Python.
- If you feel that there are other values not listed in the data models that should be tracked, fork the repository and customize it yourself. You can even send a pull request, but expect requests to add betting-related fields (e.g. odds data) to be turned down. If you do wish to extend the schema in order to track betting data, let us know and we will post a link.
- There will be a Wiki that will describe use cases in the near future.
After rewriting the Marcotti data models, it will become easier to repeat the act for Marcotti-Summary and Marcotti-Events. (Writing the test suites will still be laborious.) I wrote on Twitter that the expected release time for the last two schemas would be the middle of December, but after further thought of the work ahead I’ll be conservative and give a release date of mid-January.
I should emphasize that while the release of data models don’t mean very much by itself, this is all part of a much broader process to be more open about some of the work I do, from data ingestion to analysis and modeling algorithms to raw data. Your interest is appreciated and your contributions are valued.
Once again, you can find the Marcotti project on GitHub at this link.