Football Match Result Database: An Introduction

This morning I posted my specifications for the Football Match Result Database.  Tonight I'll discuss the design and explain my thinking behind some of the tables.

I'll start by repeating my original intent for the Database: to track historical league match results and associated statistics that were most commonly collected from the early 20th century until the present. Those statistics include the league competition, the two teams involved in a match, the players, managers, and referees involved, the goals scored, penalties and bookable offenses committed, and substitutions made.  Granted, caution/expulsion cards have only been used since 1970 and substitution rules have varied throughout the years, but those incidents have been recorded quite regularly. 

At the present time, the Database does not capture data for lower-level (I also call it micro) statistics such as shots, assists, saves, corners, tackles, passes, fouls, or free kicks.  The reason for not doing so is that these statistics have not always been collected consistently and outside the USA and western Europe are rarely collected.  If one wants to track these statistics by players, one would have to rely on play-by-play or video data which is not always available.  To this end I decided to focus on collecting data that would be readily available in as many matches and leagues as possible, and going back as many years as possible.  I do plan on adding functionality to collect these micro-statistics and track them by player, but I'm not sure when I'll get around to that.

So now, the Database.  The data model is centered around two primary tables: Events and Lineups.  The Events table models the matches the occur within a league competition and tracks their high-level information (which two teams played, where did they play, when did they play, what were the match conditions, and who was on the field and along the touchlines).  In order to enter match events, it's necessary to know the competition and the specific matchday that they refer to.  This information is captured in the Competitions and Rounds tables.  It is also necessary to know the league teams, the venues that will host league matches (neutral sites can be included), and the managers and referees.  We can also record the environmental conditions within the Environments table, which serves as a child table to Events.  Finally the Events table is connected to the Lineups table.   

The Lineups table contains the players who participate in the match (our Event), whether on the field or on the substitutes' bench.  To add records to this table we need to have the Teams, Players, and Positions tables populated.  The Positions table can be filled in advance with positions and their flanks (Left, Right, Central) — Goalkeeper is always Central flank, but Defender, Midfielder, and Forward can have either flank descriptor.  The Players table is different from other similar tables in that there is no explicit connection to a Position nor to the Team for which they play.  This connection is made in the Lineup table which includes foreign keys from the Events, Teams, and Players tables.  A record in the Lineups table tracks the match that a player participated in and the team for which he played.  There is also a flag that indicates whether the player is starting.  There will have to be logic to require that one and only one starter be a goalkeeper.  

Only players in the Lineup (and on the field!) can score goals, be involved in penalties, get booked, and be substituted, so there are links from the Lineup table to the Goals, Penalties, Offenses, and Substitutions tables.  There is also a ShiftPositions table to account for either tactical changes in position or enforced changes due to the expulsion of a goalkeeper. 

There are a couple of issues that I noticed with the database.  The first is that there needs to be a linking table between the Teams and Venues tables. It is possible for clubs to play at multiple stadiums in a single season, or over many seasons (a certainty with clubs leaving old grounds for new ones), and stadiums can host multiple teams (e.g. San Siro, Home Depot Center, Olympiastadion in Munich until Bayern left).  I currently assume a one-to-many relationship between Teams and Venues, but I will need to change that.  The second issue is the Players table.  I am still debating the use of a linking table between Teams and Players, which would make selection of players in the lineup a little less unwieldy, but at this time I'm not sure how to model situations in which a player switches clubs in the same league.  

At any rate, I hope you have a better understanding of my database design.  There are still some tweaks to be done, but I think it's a very good start.

Share