Recently, I've spent a considerable amount of time discussing team and player metrics on this site, as well as some work on certain tournament formats, but it is clear that I need a better approach to collecting and storing match data. I've been able to demonstrate the efficacy of some of these metrics by collecting and organizing the relevant raw data by hand and then doing the required analysis. That approach is absolutely brutal — extremely time- and labor-intensive, and difficult if not impossible to re-use for future projects. For an effort such as the adjusted plus/minus in soccer, hand collection of data is not realistic and highly prone to error.
Several months ago I presented some preliminary work on a database design for match results in soccer, and I received some valuable feedback from readers. I tried to implement that design in one of the database software packages, but I ended up making a hash out of it and set it off to the side. I am revisiting it now because I really need to make some progress on match database development. I've read (and am still reading) a couple of books on database design and I've also taken a tutorial on database implementation for the software on my machine (I use OpenOffice Base). For the moment I want to take a step back from the talk about metrics and analytics and instead focus intensely on developing a good database design and then a good implementation. I'm off from my day job on Friday so I'd like to make some solid progress by Sunday night.
Now in my previous post, Dave suggested that I consider what's called 'crowdsourcing' to develop match result databases of the various leagues and competitions. For those who aren't aware of the term, crowdsourcing involves companies and organizations outsourcing certain tasks to a community of users. I have said in the past that I do see some opportunities for collaboration with the virtual community for collection of data, but I'll need to seriously and thoroughly think through such an approach. I would like to develop the database template first before I seek out a network of users to collect data. Also, these databases will become very valuable in their own right, so I will have to consider some sort of financial compensation if I end up using them in my proposed business venture.
I'm just envisioning a number of financial and legal issues that will have to be navigated in order to make crowdsourcing worth its while to me and potential collaborators.