This is a soccer-centric blog, and I am to keep it that way, but I also know that a lot of basketball analysts read this site. With that in mind I am going to write about something basketball related that draws upon my experience with developing league coefficients for CONCACAF club competitions in soccer. It's also timely with March Madness — the NCAA Division I men's basketball tournament — approaching its conclusion.
The overwhelming majority of readers to this site come from the USA, but a significant number are from other parts of the world so a little bit of background is in order. College basketball, like all other North American sports leagues, has a playoff format to decide its champion. The format is a 64-team (68 as of this year) knockout tournament in which teams are seeded and play against each other in neutral sites. An obvious parallel for people outside this country is the English FA Cup, without home matches or the possibility of a replay, but with the possibility of lower-league or non-league sides knocking off established powers. Another parallel is the old European Champions' Cup in that a team had to win its national league in order to gain entry to the Champions' Cup. (This was actually the case with the NCAA tournament until the late 1970s!)
The difference between the FA Cup and the NCAA tournament is that at least half of the teams in the NCAA tournament are selected — the 31 conference champions receive automatic entry to the tournament, and the remainder are "at-large" teams that are selected by a committee using criteria such as team record, strength of schedule, conference strength, and other factors. It is the selection of these teams that is the most controversial, and every year there are debates over who was more deserving to enter the tournament ("body of work" has to be the most overused phrase in American sports talk today). Virginia Commonwealth University, one of the national semifinalists this year, was one such team; George Mason University (a 2006 Final Four participant) was another.
I propose that there has to be a better way to allocate at-large bids, and my idea borrows concepts from soccer competitions and even political science. The result is an allocation scheme that accounts for past performance by conference teams in previous NCAA tournaments, lets both power and mid-major conferences know exactly how many NCAA slots they will receive at the start of the season, and makes the regular season much more meaningful. Moreover my algorithm is not very complicated; it is a little labor intensive with some basic math and the use of the sort button in a spreadsheet.
The inspiration for my approach comes from the domestic league allocations for the European soccer competitions (Champions League and Europa League). The allocations are determined by calculating a league coefficient from club results in the two competitions over the last five seasons, with bonus points awarded for advancing to later phases of the competition. After ranking the national leagues with this coefficient, UEFA assigns the number of qualifying places to a league based on their position — top four leagues receive four places to the Champions League, leagues in places 5-8 receive three, and so forth. I don't believe that such an allocation scheme would be appropriate for at-large bids, as not every conference would be deserving of extra places based on previous results. Enter a concept from proportional representation in political science.
The largest remainder method is one form of proportional representation for assemblies with party list systems. It assigns seats to parties based on the number of votes that they receive relative to a quota, which is some ratio of the total number of (valid) votes cast to the number of seats in play. (This type of ratio is called a Hare quota.) Parties first receive the integer part of the results of their votes divided by the quota, which is subtracted to give a remainder or the decimal part. Now, if you sum up all of the integers there will almost always be a certain number of unallocated seats. The parties are then ranked by their remainder and those with largest remainders are given one additional seat until there are no more unallocated seats. Here is one example:
|Party A||Party B||Party C||Party D||Total|
Now, replace "parties" with "conferences", "seats" with "bids", and "votes" with "coefficient" and you have a pretty good mechanism for allocating at-large bids to the NCAAs.
So the "conference coefficient" is rather simple. For each conference in a given season you tabulate the number of teams that advance to the play-in round, then to the first round, second, third (Sweet Sixteen), and subsequent rounds up to the final survivor. The result is the following expression:
Season Coeff = C1*N1 + C2*N2 + C3*N3 + C4*N4 + C5*N5 + C6*N6 + C7*N7 + C8*N8
C_i is a weight that can be applied to each round of the competition. I used C1=0.5, C2=1, C3=2, and so on to C8=7. I use the previous five seasons to get a conference coefficient, which is either a unweighted sum of the season coefficients or a weighted one:
Conference Coeff = 1.0*S_0 + 0.5*S_1 + 0.25*S_2 + 0.125*S_3 + 0.0625*S_4
I try my allocation with both kinds of sums. I do not use an average to compute the conference coefficient; the allocation doesn't work very well in that case, most likely because you're no longer allocating bids from the sum total of "votes" or "wins".
To obtain the Hare Quota, I sum the coefficients over all of the conferences and divide the total by the number of at-large bids in play (37). Then I follow the procedure that I stated above for the largest remainder method in order to calculate the number of at-large bids per conference.
So now that I've written almost 1000 words on the allocation scheme, here are some results. The first set of results are for a 5-year unweighted conference coefficient with totals, the Hare quota, and at-large allocation results.
|Conference||5-Yr Raw||Ratio||Integer||Remainder||Residual Bids||Total At-Large|
|At Large Bids||37||27||10||37|
I repeat the process with a weighted 5-year conference coefficient. You can see that the allocation method concentrates the at-large distribution among the power conferences, but mid-major conferences whose teams have gone deep in the tournament recently are rewarded.
|Conference||5-Yr Weighted||Ratio||Integer||Remainder||Residual Bids||Total At-Large|
|At Large Bids||37||27||10||37|
The end result of the allocation scheme is that while the bulk of the at-large bids go to teams from the power (BCS) conferences, the mid-major conferences are represented more fairly. The distribution of at-large bids with this approach is 27 to the BCS conferences and 10 to the mid-majors, compared to 30 for the power conferences and 7 to the mid-majors in this year's tournament. The closeness of the allocation results to those arrived at by the NCAA Selection Committee adds a degree of credibility to the process, in my opinion.
So would such an approach ever be adopted by the NCAA? I admit to being pessimistic about the prospects. The members of the Selection Committee have consistently stated that past conference performance does not carry any weight in the number of teams selected, so I doubt that they would consider an approach that does take into account conference performance in the national championship tournament. Moreover, this approach takes a lot of power out of the hands of the selection committee by removing the "invitation" aspect from the NCAA tournament. The counterargument is that such an allocation scheme would free up the committee's time to argue seeding of the teams, which is just as controversial as who actually gets into the tournament. In the end, such a proposal would require someone to champion it, whether someone connected with the smaller conferences as either a coach or director, or some other advocate of the mid-major conferences.
So that is my proposal for a better and more predictable allocation of at-large slots to the NCAA tournament. I'm pretty sure there will be varying opinions on my idea, but I welcome your comments all the same.
UPDATE: Before I forget, here are my spreadsheets in OpenOffice and Excel formats: