I want to post an update on the two MCFC Analytics-related projects that we at Soccermetrics initiated a few weeks ago.

If you’ve been reading recent posts, I’ve developed a data schema (essentially definitions of database tables) that handles summary statistics such as those contained in the ‘lite’ dataset.  I’ve set up a GitHub project to store all development work and documentation and I’ve started to write a program to convert the dataset into a database based on that schema.  There are a lot of things going on this week and next, so my time is limited, but I have a lot of previously written code that I can drop in for this application so it may not take that long to write.

I’ve also found referee data for every Premier League match last season, so we can associate referees with matches and track actions therein.  It’s somewhat easy to find match timings for goals and substitutions, but the challenge is how to incorporate them into a database in an efficient manner.  If anyone has ideas they’re welcome to contact me.

Now, I’m aware that not everyone has the skills or desire to manipulate a database through SQL, much less write software code in Python, Ruby, or whatever, so I’ve been thinking of the best way to deliver a converted dataset.  One way is to generate a collection of CSV files for each of the nine categories, so that instead of seeing a huge table of 200+ columns, there would be smaller sheets of 10-30 columns.  Would that be simpler for people?  I don’t know.

Another way is to write a web-based application that outputs certain summary statistics for a player over the entire season or a subset.  Perhaps there could be a mashup with one of the Javascript libraries to generate some interesting data viz (yes I know about  Tableau, but I use Linux and they’re not on it).

The bottom line is that there are some initial applications I could build, but they won’t be perfect and they won’t appeal to everyone.  But I do want to put something in front of people so that they can comment on it and propose changes or new features.

I do want to emphasize that this project is for the greater community that has formed around MCFC Analytics, and we want to build something that helps people make better and more creative insights from this initial dataset.  The Performance Analysis team at MCFC is also invited to help contribute, or at least observe — I know that there are some software developers at the club.

Again, the site for the two projects is located here, and I’ll populate it with something usable by the weekend — assuming that I’m not fully consumed with pitch preparations.

E-mail us if you want to talk to me about the analytics you are working on or if you want access to the data using the Soccermetrics Football Match Result and Match Event database schemas.

