Do Pythagorean residuals provide insight into best manager performances?

The Pythagorean residual of a team, defined as the difference between its actual and expected point totals given identical goal statistics, is used often to assess roughly its under-performance or over-performance.  But can that residual be used to assess managerial performance, too?

I’ve looked at Pythagorean tables for the most recent seasons of the English Premier League and noticed that in four of the last five seasons the Premier League Manager of the Season has come from a club with one of the two or three highest Pythagorean residuals in the table.  Alan Pardew (2011-12) and Alex Ferguson (2012-13) managed teams with the highest Pythagorean residuals in the competition, and Ferguson (2010-11) and José Mourinho (2014-15) managed clubs that were very close to the highest residual.  But if you look at the list of Managers of the Season recipients since the 1999-2000 season, you see something different:

English Premier League Managers of the Season since 1999-2000 season, their clubs’ Pythagorean residuals, and the clubs with the highest Pythagorean residuals in the competition.

Since the end of the 20th century, six winners of the Manager of the Season award managed clubs that were the biggest overachievers in the competition, according to the soccer Pythagorean table.  If you want to be generous and include those managers whose clubs were a point away from the highest residual, there are nine Managers of the Season whose teams over-performed the most.  So about 50% of those teams among the most over-performing in the Premier League saw their managers win the best of season award.  It’s possible that the Pythagorean residual isn’t very informative when it comes to assessing managers, and some baseball sabermetricians have been similarly pessimistic about its utility.

So it appears that the predictive power of Pythagorean residual to identify overachieving managers may not be much stronger than flipping a coin.  Could actual points won as well as Pythagorean residual predict the team coached by the manager of season?  Let’s build a support vector machine (SVM) classifier to find out.

Support vector machines were invented by Vladimir Vapnik and Alexey Chervonenkis in the 1960s and refined by Vapnik and collaborators in the 1990s.  Support vector machines are used primarily to create boundaries in space that classify data points into one of two (or more) categories.  You can use a logistic regression to classify data points as well, but support vector machines have twin advantages of being more robust and able to create nonlinear boundaries more easily.

For this classifier, the independent variables are the actual points won by a team in a given season, and the expected point total for that team.  The dependent variable determines whether the team’s manager won the Manager of the Season award — yes=1, no=0.  The SVM classifier uses a radial basis function kernel (C=1.0, γ=1.0) and is trained with 11 seasons (220 data points) of end-of-season point totals for Premier League teams.  Five seasons of data, or 100 data points, are reserved to test the performance of the SVM.

I ran the SVM at least ten times to assess the average performance of the classifier on the test set.  Here it is:

 Predicted No Yes Truth No 82.7 12.3 Yes 1.4 3.6

Here is what a sample classifier looks like, overlaid with points from the test data set:

A SVM classifier to predict candidates for Premier League Manager of the Season. Red dots = award winners.

The classifier assigns those teams with leading point tables and Pythagorean residuals greater than zero to the Manager of the Season category.  In most years, this is a reasonable thing to do, as almost all of the Premier League Managers of the Season have managed the top teams in the league and/or influenced them to perform much better than an average team with similar goal statistics.  However, such a classifier will miss winning finalists such as George Burley with Ipswich Town or Tony Pulis with Crystal Palace, which are two sides that overcame preseason expectations as opposed to statistical expectations from matches already played. Nonetheless, the classifier, as simplistic as it is, does a fairly good job of identifying possible winners of Manager of the Season, as long as those Managers lead clubs near the top of the table.  Does it provide insight that an observer couldn’t obtain by considering the champion or another team in the top three?  No.

It remains to be seen if this classifier can do as good a job of predicting Manager of the Season for this season (2015-16).  I’ll revisit this question after Spurs’ match later today.

UPDATE (4/25, 23:55): With the latest results, I reran the classifier and predicted the contenders for Manager of the Season for 2015-16.  Out of the eight runs that I conducted, Leicester City was a positive hit every time, followed by Tottenham Hotspur and (rarely) Arsenal and Manchester United.  It’s almost certain that Claudio Ranieri will win the Manager of the Season award, so it’s not like the classifier is providing any special insight.