Last August, I wrote something about my expected saves model and used it to assess goalkeeper performance in Argentina’s Primera División. To be honest, I was deeply unsatisfied with the model. It yielded a very optimistic expectation of saves made and as a result the expected goals allowed was extremely low. Every goalkeeper in the list let in way more goals than expected, which wasn’t all that interesting for analysis purposes.
To remedy the problems that I observed with the model, I focused on two features: the use of an intercept in the logistic coefficients and the weighting of the save/no save events. The intercept affects the probability of an event in the absence of all the other parameters that describe a shot. Because of the way that I code plays, such an event is a penalty shot from right in front of the center of the goal line that goes straight. Penalty shots, of course, don’t occur from the goal line, so it’s possible to remove the intercept. I evaluated the model performance with and without the intercept and on the current Superliga data the intercept added 1.5 to the average expected saves total and shrank the expected goals allowed total by 1.5 goals.
The weighting of the save/no save shot events turned out to be much more significant. Compared to other unbalanced data sets, the ratio between shots saved and not saved isn’t huge at just over 2:1. But on a team-by-team level, the ratio varied from 2:1 to more than 4:1. Switching to a balanced weighting of classes in the model had a huge effect on the coefficients associated with the shot parameters and the expected saves and goals allowed. On average the number of expected saves shrank by 3.4 saves between a model with uniform weighting and a model with balanced weighting. But the latter model increased expected goals allowed by an average of 7.4 goals, which brought the totals much closer to the actual number of goals allowed.
In the end, I decided to implement the expected saves model with balanced class weighting and no intercept.
Expected Saves Results
Now it’s time to show some results from this season’s Argentine Superliga. The event data has been supplied by DataFactory LatAm and analyzed to produce expected saves and expected goals allowed by each team. This analysis focuses on the 31 goalkeepers who have played at least 630 minutes in the league as of the end of Round 19.
Below are the tabulated and calculated totals for shots on goal, actual and expected saves, and actual and expected goals allowed. Own goals have been ignored. There may be discrepancies between the total shots on goal calculated by DataFactory and me.
The total number of goals allowed remains larger than the total number of expected goals allowed, but the discrepancy is not as wide as indicated by the previous expected saves model. More interestingly, some goalkeepers have allowed fewer goals than expected and others more goals than expected, which is what I was hoping to see when I created the model.
The above chart is sorted by the number of goals allowed above expected (which I call GAAx), and Guido Herrera of Talleres heads the list. You can also see other goalkeepers who you might expect, such as Agustín Rossi of Boca Juniors, and perhaps those you might not, such as Jeremías Ledesma of Rosario Central and Marcos Díaz of Colón (Santa Fe). Patronato has shown the greatest discrepancy between expected and actual goals allowed, which accounts for their much higher-than-expected league position. I was hoping that the goalkeeping statistics would provide some insight on this surprise, but the first-choice ‘keeper Sebastián Bértoli’s performance has been just about in line with expectations.
So what do these figures look like over 90 minutes? Here is another chart, once again with own goals ignored.
I’ve sorted these figures by actual goals allowed over 90 minutes but also highlighted those goalkeepers whose GAAx/90 minutes is negative. It’s very interesting to see those goalkeepers who have allowed a low amount of goals and allowed fewer than expected. In this light we see Herrera and Rossi but also Iván Arboleda of Banfield. It’s also interesting to see goalkeepers such as Nereo Fernández (Unión de Santa Fe) and Luís Unsaín (Defensa y Justicia), who have have GA/90 below one but positive GAAx/90.
In a future post, I’ll reevaluate the goalkeeper statistics for last season’s Argentine top flight and take a further look into some of the surprising results that I have seen above.