I’ve heard this statistic frequently cited. “The odds of a perfect NCAA tournament bracket are 1 in 2^63.”

For those of you who don’t live in the USA, college basketball has an annual tournament. 64 teams qualify, and they play a single-elimination tournament. The bracket is divided into 4 quartiles, with teams seeded #1 to #16 in each quarter.

Many workplaces have a betting pool where people fill out a bracket, trying to predict the results. The tournament was recently expanded to 68 teams, with 4 preliminary games and 60 teams getting a bye. However, most brackets only consider the main 64 team field. I’m only going to consider the main 64 team field and ignore the 4 play-in games.

The false statistic is “The odds of a perfect bracket are 1 in 2^63.” Do you see the fallacy?

SPOILER SPACE

SPOILER SPACE

SPOILER SPACE

SPOILER SPACE

SPOILER SPACE

SPOILER SPACE

SPOILER SPACE

SPOILER SPACE

SPOILER SPACE

SPOILER SPACE

“1 in 2^63″ assumes that you flip a coin to determine the winner in each game. That is ridiculous. No human would pick that way if he was trying to win. No #16 seed has ever beaten a #1 seed, but when you flip coins you predict such an upset 15 out of 16 times. (There are four #1 seeds and four #16 seeds, one in each quarter of the bracket. Technically, they are the overall seeds 1-4 and 61-64, but to keep things simple they are referred to as #1 and #16 seeds.)

You can use Jeff Sagarin’s ratings to predict the winner when two teams play. The rating converts to the odds of the stronger team winning. I wrote a program to simulate the tournament results using Sagarin ratings.

Originally, I wrote the program in PHP, but it was too slow and I ran out of memory. I rewrote it in C++, and the performance was great. It only took 10-20 minutes to simulate 100000000 (1E8) tournaments and parse the results. Try doing that in Java or any interpreted language or any bytecode language!

My program used the 2012 NCAA tournament seeds and ratings. The 2013 tournament data should lead to a similar result.

In 1E8 simulations, there were 30 pairs of simulations that had the exact same result. There were no triples. In 1E8 simulations, that’s 5E15 possible comparisons (1E8 choose 2). Therefore, the odds of a perfect bracket are 30 in 5E15 or 1 in 1E14 or 1 in 2^47.

The odds of picking a perfect NCAA bracket are not 1 in 2^63, because a coinflip picks too many ridiculous upsets. If Sagarin ratings are a good predictor of the winner, the odds of a perfect bracket are closer to 1 in 2^47 rather than 1 in 2^63. If you use a Sagarin simulation to pick your bracket, the odds of a perfect bracket are approximately 1 in 2^47.

How do you convert the ratings advantage into the odds of winning?

For example, I’m using the Maclaurin Series to estimate the Error Function erf()^-1 so I can predict how many times the favorite wins in a 1000 trials. I use the Sagarin Rating to represent the Mean for each team. For example, if the favorite (Louiville in 2013) has a rating of 95.01 and the underdog (Mississippi) a rating of 85.03 I can calculate the number of wins… But I have to assume a value for sigma. In my program, if I use 10, the favorite wins about 76% of the time. But if I use 25, they win only 61% of the time. How can estimate an appropriate sigma?

`function calculate_winning_chance($t1, $t2)`

{

$diff = $t1 - $t2;

return 1.0/(1 + exp(-0.15 * $diff));

}

$t1 is the Sagarin Rating of the favorite, $t2 is the Sagarin Rating of the underdog (but only the rating difference matters).

Exp(x) is the exponential function, e^x.