An update on projects and software

About a month or so ago on Twitter, I wrote that it was past time for me to provide an update on some ongoing projects.  Unfortunately that will have to wait a while as some projects have been submitted as proposals to the MIT Sloan Sports Analytics Conference and the OptaPro Analytics Forum.  I know that members of the reviewing panels for both meetings read this site occasionally, so I can’t be more specific about the projects at this time.

I can write about a couple of other projects that I haven’t submitted to a conference.  There will be a 2015 version of MLS front-office efficiency ratings, and the 2015 projections for Major League Soccer’s regular season will be reexamined after the final round of matches next weekend.  The efficiency ratings have undergone significant revisions over the last two seasons, and they’ve finally reached some level of maturity.  The league projections are on the verge of being heavily revised as I transition to expected goal models that are more under my control.  It’s my hope that these changes will be ready to go by the spring, but it all depends on time available to me.  At least I don’t plan on moving anytime soon.

The most significant update this summer has been to the data models that undergird Soccermetrics’ analytics software.

[Warning: If you don’t want to read me nerd on about software, feel free to skip the rest of the post.]

In the early days of Soccermetrics, I designed the data models in pure SQL (with some Postgres customizations) and wrote my own software tools to handle basic operations and more complex actions.  It did the job, but the tools were brittle and didn’t scale.  My software development skills have grown a lot since then, and I’ve used SQLAlchemy to rewrite schema definitions as a collection of Python classes that describe the data models and alembic to track changes to the model and handle migrations.  SQLAlchemy interfaces with a wide range of relational database engines, and it has a large collection of enterprise-level tools that simplifies the codebase greatly.

I implemented these changes on the Marcotti database that supports the player career forecasting system and I’ve liked the results, so I will apply the same changes to the rest of the Marcotti database models.  There are four different granularities of Marcotti, and I will probably revise (and open-source) Marcotti-Light first.  Beyond that, I haven’t decided on the order of work yet.  Looking further down the road, my goal is to consolidate the Marcotti repositories into a single repo that contains model class definitions at each level of granularity (as well as club vs national team), and then define a database schema as a collection of classes, but that is far, far down the roadmap.  (If you understood what I meant, fantastic; if you have no idea what I was talking about, don’t worry about it.)

The next step is to refactor the current analytics toolkit so that it is fully decoupled — as much as possible! — from the data models that supply it with data.  That task will require a lot more work on a design and implementation level.

I suppose the takeaway from all this is that soccer analytics has evolved to the point where I’m feeling more comfortable with the idea of open-sourcing more of these projects that I have been previously.  Stay tuned as I figure out which projects to open up first.