What I’m learning from building a database using FMRD

If you have been following my Twitter feed over the last ten days, I’ve been building a match result database for the 2012 Africa Cup of Nations using my Football Match Result Database schema.  I wanted to test my designs in the field, so to speak, using a competition that would exercise almost all of the major FMRD tables.  I’d like to share with you some of my experiences with entering and retrieving data to and from the database.

I collected AFCON data from publicly available sources around the Web, and collecting and cleaning data turned out to be a big challenge.  The reliability of the data for this competition is very inconsistent — player names had widely different spellings, there were phantom bookings,  players were substituted out of a match twice.  No one site had all of the data that FMRD is set up to collect.  And while you might think that the official AFCON website would have authoritative data, it turned out that their statistical summaries were worse than useless.

Resolving player names was also a major issue.  As expected, the Arab names were the most difficult as there appeared to be multiple spellings for every website.  Single-named players from Equatorial Guinea and Angola were also a challenge, particularly because some sites referred to the players by their full names while others used either their nickname or a combination of the two.  In situations such as these, it would be ideal to refer to the official squad lists, but the lists given by CAF were not at all consistent.  In the future I might add some fields to the Players table to reduce the chances of name collisions, such as the adding a middle/extra name field in the Players table or appending the birthdate to the full name field in the saved query.

Data collection issues aside, the database worked very well and the saved queries worked as expected.  (I had created a SQLite version of the database.)  Right now data entry remains a manual process, but the data entry tool removes most of the pain involved.  I did not use the tool to insert player data; instead, I aggregated the data into a spreadsheet, saved the sheet as a CSV file, and wrote a script to import the CSV file into the Players table.

While the data entry tool worked well and ensured relational integrity for the most part, there were some issues [skip ahead if you don’t want to read some inside baseball]:

  • Data in the Offenses, Penalties, and PenaltyShootouts tables are not saved properly.  When I tested the entry tool on a PostgreSQL database, the data were saved correctly, so I think this is an issue with SQLite databases.  The problem is that in the Player comboboxes I filter the names of players who appear in the match lineup for a specific team. The filter works, but some reason the PyQt setData() function call fails to save the lineup ID data to the database.  It comes down to an issue with the delegate (PyQt uses a model-view-delegate framework; I was able to create a workaround in the code but it creates a situation that could allow the user to insert inconsistent data.
  • In the match event tables, comboboxes are not reset if a new match is selected.  When you go through the competition phase details and select a specific match, one or more of the comboboxes are activated.  If you select another match and try to enter information specific to that match, the comboboxes and edit boxes are not reset.  Again, it creates a situation where inconsistent data can be inserted.  You can get around this by closing the dialog window and reopening it.  Yeah, it’s annoying, but it only adds an extra two seconds.
  • Overall, there are too many issues with the object-relational mapper to allow the data entry tool to work well with SQLite databases.  It really stinks because SQLite databases are easy to use and deploy, but there is a reason why I started out with a Postgres database.  It just means that I need to accelerate the development of an API to write to and read from the database. More details coming.

I’m caught up with the AFCON 2012 database and I’m having fun writing SQL queries and obtaining interesting results, some of which I’ve shared on my Twitter feed.  I do need to add a few more descriptions of goalscoring events and fouls, but overall I like the design a lot and am working on methods to extract compelling information from the database.