Data ingestion tools now available in Marcotti repo

For the last couple of months, I’ve been updating and modernizing the Marcotti data schemas that Soccermetrics uses to build its match databases.  It’s good work, but databases need data to be placed in them, and hopefully in a systematic and reliable way.

Over the years, I have written a lot of scripts and modules to ingest CSV and XML data into MCFC Analytics and Soccermetrics Connect databases, but I haven’t made them public until now.  I’ve created a submodule within the Marcotti repository called ingestion which contains code that reads CSV files and inserts the contents into the database.  It’s more of an ETL tool than a straight ingestion tool, but I’m not going to be too pedantic about it.

I have used this code before so it should work, but I haven’t tested it in this environment so there will almost certainly be bugs and typos.  It does require a lot of supporting data in order to place match data in their proper context.  It will be refactored quite heavily — currently you can ingest data from club league matches, but that functionality will be extended in the future.  There are assumed field names of the CSV data, which is unavoidable; it would be nice to permit some customization of those fields but that requires a lot more effort than I am willing to devote.

Again, the link to the Marcotti repository is here.

Share