Can we express uncertainty in MLS salary data?

The MLS Front-Office Efficiency ratings always generate a lot of discussion for various reasons, from its methodology to its use of salary data from the MLS Players Union surveys.  Much has been written about the accuracy of the salary data and the inability to fully account for outlays on the clubs’ (and MLS’) books.  Steve Fenn wrote a comprehensive post on the subject in 2014 and was prominent in the conversation on Twitter:

Let’s put aside the issue of club expenditures on player academies, which have never been public and won’t be for a while (I’m all-or-nothing on their availability before I incorporate them in an efficiency model).  I believe that the conversation about flawed salary figures should be expressed in another way — salary uncertainty.  By communicating our confidence in the figures in terms of uncertainty, we could continue to use them in follow-on calculations while remaining clear-eyed about the range of possible values.

So how would you formulate that uncertainty?  MLS HQ knows the true amount, but they won’t make it public.  Club officials have a better idea, but such information is off the record.  Dedicated fans have some idea, but they are working off fuzzy or just plain wrong assumptions.  One solution that comes to mind is to incorporate some Bayesian analysis and express these opinions as a prior distribution that expresses the salary uncertainty.  Here’s how one such prior might look:

One possible prior function that describes uncertainty in MLS player salaries.

One possible prior function that describes uncertainty in MLS player salaries.

This prior is incorporating the following assumptions about salary data:

  • Uncertainty in the actual salary has a significant relation with the reported salary amount.
  • It’s very likely that the uncertainty region is not evenly distributed at a reported salary level.
  • We have high confidence that players reported as making the league minimum are indeed making that salary.
  • As the reported salary increases, we expect the uncertainty bound to widen, especially the upper bound.
  • We are most uncertain about the salaries of the highest-paid players in the league — our Designated Players.

The idea is that this prior distribution would be updated with player salary data to come up with a posterior distribution that expresses where the uncertainty really lies.  The posterior may or may not make sense, but we won’t know until we run the analysis.  One could then use the uncertainty as part of a Monte Carlo analysis of squad utilization, total usable payroll, and ultimately the front-office efficiency rating.   I haven’t completely thought this through, but I would incorporate salary uncertainty using some approach along those lines.

One other point that Fenn made was to treat the results less as a point in time and more as part of a trend that may or may not be occurring.  That’s probably a better approach given the inherent uncertainty in the data, and one that I’ll try to make in future posts.


Tags: ,