Calculating the odds of winning Mega Millions using Transact SQL
Mar
29
Written by:
Charles Flock
3/29/2012 4:51 PM
A look at the COMBIN function in SQL Server and how to turn what is normally a complicated series of spreadsheet calculations into a simple set operation on the database.
On Thursday morning my CTO, Joe, exclaims “Holy cow, the Mega Millions is $500 million!” This leads me to speculate as to whether or not it makes sense to buy every single combination of numbers, thus guaranteeing a winner, and what the odds of winning actually are. Not that we would actually be in a position to buy every number or have the time to do so, but it is interesting to see what the odds look like.
If, like me, you don’t really know anything about Mega Millions, here is a brief explanation, or you can just go to the Mega Millions web site and get the official explanation. Basically, there are two populations of numbered balls. The first population contains 56 uniquely numbered white balls, containing the numbers 1 through 56. The second population contains 46 uniquely numbered mega balls, containing the numbers 1 through 46. 5 white balls are randomly selected and 1 mega ball is randomly selected. Players buy a card which contains their picks for the white balls and the mega ball, and receive a jackpot based on how many numbers match.
A player wins a prize according to the following chart:
White Balls
|
Mega Balls
|
Prize
|
5
|
1
|
Jackpot
|
5
|
0
|
$250,000
|
4
|
1
|
$10,000
|
4
|
0
|
$150
|
3
|
1
|
$150
|
2
|
1
|
$10
|
3
|
0
|
$10
|
1
|
1
|
$3
|
0
|
1
|
$2
|
According to the chart, there are 9 possible ways to win.
Calculating the odds of winning a specific prize is an exercise in combinatorics. To perform this calculation, all we need to do is use the XLeratorDB COMBIN function and supply it with relevant data.
The first thing I did was to put all the relevant data into some variables.
DECLARE @n as int = 56 --the number of white balls
DECLARE @m as int = 46 --the number of mega balls
DECLARE @x as int = 5 --the number of white balls selected
DECLARE @jackpot as int = 500000000 --estimate of the size of the jackpot
DECLARE @c as TABLE (
n_balls int,
m_balls int,
prize int
)
INSERT INTO @c VALUES (5,1,@jackpot)
INSERT INTO @c VALUES (5,0,250000)
INSERT INTO @c VALUES (4,1,10000)
INSERT INTO @c VALUES (4,0,150)
INSERT INTO @c VALUES (3,1,150)
INSERT INTO @c VALUES (2,1,10)
INSERT INTO @c VALUES (3,0,7)
INSERT INTO @c VALUES (1,1,3)
INSERT INTO @c VALUES (0,1,2)
The calculation of the odds of winning is then straightforward:
DECLARE @combin as float = wct.COMBIN(@n,@x) * @m --number of combinations
SELECT n_balls as [White Balls]
,m_balls as [Mega Ball]
,Prize
,CASE m_balls
WHEN 1 then Odds
ELSE Odds /(@m - 1)
END as Odds
FROM (
SELECT *
,@combin /((wct.COMBIN(@x,c.n_balls) * wct.COMBIN(@n-@x,@x-c.n_balls))) AS Odds
FROM @c c
) n
This produces the following result.
White Balls Mega Ball Prize Odds
----------- ----------- ----------- ----------------------
5 1 500000000 175711536
5 0 250000 3904700.8
4 1 10000 689064.847058824
4 0 150 15312.5521568627
3 1 150 13781.2969411765
2 1 10 843.75287394958
3 0 7 306.251043137255
1 1 3 140.625478991597
0 1 2 74.8007866976578
Let’s look at the odds column and discuss how we came up with those numbers. In the first row we see that the odds of winning the Jackpot by matching the 5 white balls and the 1 mega ball are 1 in 175,711,526. What this means is that there are 175,711,526 possible combinations and only one of those combinations can be a winner. How did we come up with that number? There 3,819,816 possible combinations of 5 numbers drawn from the white balls. When we are talking about combinations, the order doesn’t matter, so the combination 1-2-3-4-5 is the same as the combination 2-3-4-5-1 and all the other possible permutations of the numbers 1 through 5.
We then multiply the number of white ball combinations by the number of mega balls, since we are only picking 1. The number of possible combinations is thus 3,819,816 * 46 which equals 175,711,526. This is what is being calculated in this statement.
DECLARE @combin as float = wct.COMBIN(@n,@x) * @m
The next row tells us that the odds of winning $250,000 by matching the 5 white balls and not matching the mega ball are 1 in 3,904,700.8. This number is the number of combinations divided by the number of non-winning mega ball numbers, of which there are 45. The odds are 175,711,526 / 45.
In the third row, we see that the odds of winning $10,000 are 1 in 689,064.85. There are 255 combinations of having exactly 4 of the winning white balls. How did we come up with that number? There 5 ways to combine the 4 winning numbers and there 51 remaining numbers to choose from. 5 * 51 = 255 and 175,711,526 / 255 = 689,064.85. The next row, the odds of winning $150 with 4 of the winning white balls and not the mega ball, are 689,064.85 / 45 = 15,312.55 or 1 in 15,312.55
Let’s change the SELECT statement a little bit, so that in addition to the odds, we can also see how many possible winning combinations there are, which should make it a little easier to follow how the odds are calculated.
SELECT n_balls as [White Balls]
,m_balls as [Mega Ball]
,Prize
,CASE m_balls
WHEN 1 then Odds
ELSE Odds/(@m - 1)
END as Odds
,CASE m_balls
WHEN 1 then Matches
ELSE Matches *(@m - 1)
END as Matches
FROM (
SELECT *
,@combin /((wct.COMBIN(@x,c.n_balls) * wct.COMBIN(@n-@x,@x-c.n_balls))) AS Odds
,((wct.COMBIN(@x,c.n_balls) * wct.COMBIN(@n-@x,@x-c.n_balls))) AS Matches
FROM @c c
) n
This produces the following result.
White Balls Mega Ball Prize Odds Matches
----------- ----------- ----------- ---------------------- ------------------
5 1 500000000 175711536 1
5 0 250000 3904700.8 45
4 1 10000 689064.847058824 255
4 0 150 15312.5521568627 11475
3 1 150 13781.2969411765 12750
2 1 10 843.75287394958 208250
3 0 7 306.251043137255 573750
1 1 3 140.625478991597 1249500
0 1 2 74.8007866976578 2349060
Now it’s easy to see that we calculated the odds as the number of possible combinations divided by the number of matches.
Knowing the number of matching combinations lets us do some other interesting calculations. First, let’s calculate the odds of any ticket being a winning ticket. This is simply the number of combinations divided by the sum of the matches. This SQL will do that calculation for us
SELECT @combin/sum(wct.COMBIN(@x,c.n_balls) * wct.COMBIN(@n-@x,@x-c.n_balls) * wct.COMBIN(@m-1,ABS(m_balls - 1))) as [Odds of winning]
FROM @c c
This produces the following result.
Odds of winning
----------------------
39.8883327135951
The odds of any ticket being a winning ticket of some kind are 1 in 40.
Let’s get back to the original proposition. Assuming that we had enough resources to buy every single combination, it would cost $175,711,526 for all the tickets. And since we have every single combination, we would be guaranteed of winning every single prize. It’s pretty easy to calculate the expected value of the winnings by simply multiplying the number of winning tickets by the value of the prizes.
SELECT sum(wct.VALUE(Prize) *
wct.COMBIN(@x,c.n_balls) * wct.COMBIN(@n-@x,@x-c.n_balls) * wct.COMBIN(@m-1,ABS(m_balls - 1))) as [Expected Value]
FROM @c c
This produces the following result.
Expected Value
----------------------
531979120
It seems like we could guarantee a $531,979,120 return on a $175,711,526 investment. And even if we took the $500,000,000 jackpot as a lump sum payment, and we estimate that amount to be $360 million, it seems like we could double our money.
The problem with this analysis is that the jackpot is paid to the holder(s) of the winning combination. If there is only one holder, then the entire jackpot is paid to the one holder. If there are multiple holders, then the jackpot is divided evenly among the number of winning entries. Thus, it would be good if we could come up with an estimate of how many jackpot winners there might be.
Estimating the number of jackpot winners will be based upon the number of tickets sold. Since the previous Mega Millions jackpot had sales of over $190,000,000, let’s assume that there is no chance that fewer than 200,000,000 tickets will be sold. We will also assume that ticketholders pick numbers randomly, which is more than likely not true, but it will keep things simple.
In order to do this, we need to turn our odds of winning the jackpot into a probability. The following SQL statement will calculate the odds of no jackpot winner given 200,000,000 entries:
DECLARE @n as int = 56 --the number of white balls
DECLARE @m as int = 46 --the number of mega balls
DECLARE @x as int = 5 --the number of white balls selected
DECLARE @probw as float = 1 /(wct.COMBIN(@n,@x) * @m)
DECLARE @tkts as int = 200000000
DECLARE @winners as int = 0
SELECT wct.COMBIN(@tkts, @winners) * POWER(@probw, @winners) * POWER(1-@probw, @tkts-@winners) as [prob 0]
This produces the following result.
prob 0
----------------------
0.320385858101892
If we wanted to calculate the probability of there being exactly one winner, we could simply change the @winners variable to 1
prob 1
----------------------
0.364672539115764
Thus, there is approximately a 32% probability that no one will win the jackpot and 36.5% probability that exactly one person will win the jackpot. But, it’s also possible that more than one person wins, so let’s expand our SQL to include multiple winners.
SELECT @tkts as tickets
,k as winners
,wct.COMBIN(@tkts, k) * POWER(@probw, k) * POWER(1-@probw, @tkts-k) as prob
FROM (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8)) winners(k)
This produces the following result.
tickets winners prob
----------- ----------- ----------------------
200000000 0 0.320385858101892
200000000 1 0.364672539115764
200000000 2 0.207540465281577
200000000 3 0.0787428718348599
200000000 4 0.0224068586771258
200000000 5 0.00510082806549692
200000000 6 0.000967651884280338
200000000 7 0.000157344228222171
200000000 8 2.23867236028533E-05
However, we also want to consider the multiple number of entries. So, let’s start at 200 million and go in increments of 25 million up to 500 million.
SELECT n
,k
,wct.COMBIN(n, k) * POWER(@probw, k) * POWER(1-@probw, n-k) as prob
FROM (
SELECT n * POWER(10, 6) as n
,k
FROM (VALUES (200),(225),(250),(275),(300),(325),(350),(375),(400),(425),(450),(475),(500) ) tickets(n)
CROSS APPLY(VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8)) winners(k)
) m
Here are the first few rows of the resultant table.
n k prob
----------- ----------- ----------------------
200000000 0 0.320385858101892
225000000 0 0.277896134114208
250000000 0 0.241041417411943
275000000 0 0.209074390664538
300000000 0 0.18134684611916
325000000 0 0.157296541641645
350000000 0 0.136435799915507
375000000 0 0.118341619620554
400000000 0 0.102647098071686
425000000 0 0.089033991391379
450000000 0 0.0772262613556218
475000000 0 0.0669844780601882
500000000 0 0.0581009649079587
Let’s use the PIVOT function to make things a little clearer.
SELECT n
,ROUND([0],4) as [0]
,ROUND([1],4) as [1]
,ROUND([2],4) as [2]
,ROUND([3],4) as [3]
,ROUND([4],4) as [4]
,ROUND([5],4) as [5]
,ROUND([6],4) as [6]
,ROUND([7],4) as [7]
,ROUND([8],4) as [8]
FROM (
SELECT n
,k
,wct.COMBIN(n, k) * POWER(@probw, k) * POWER(1-@probw, n-k) as prob
FROM (
SELECT n * POWER(10, 6) as n
,k
FROM (VALUES (200),(225),(250),(275),(300),(325),(350),(375),(400),(425),(450),(475),(500) ) tickets(n)
CROSS APPLY(VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8)) winners(k)
) m
) D
PIVOT(sum(prob) for K in([0],[1],[2],[3],[4],[5],[6],[7],[8])) as P
This produces the following result.
We can see, that even at 500,000,000 entries, there is a 5.8% chance that no one will win, but that the chance of 2 winners (23.5%) or even 3 winners (22.3%) is greater than the chance of there being only one winner.
So, even if we could make the bet on every single combination, it seems likely that we would lose money, as the ticket sales should be extremely high. We could even take this information (maybe next week, if there is no winner) and refine our calculation a little bit to calculate the expected pro rata jackpot amount based on the number of tickets sold.