*What are the chances that any two people in the room have the same birthday?*

As the holidays approach, you may find yourself in a room filled with celebratory people. If you are anything like me, you will survey the room and ask yourself, what are the chances that any two people in this room have the same birthday? Not the same birthday as me, and not the same birth date, just the same birthday.

Let’s say that there are 30 people in the room and, to make it more interesting, I decide to place a bet that there are 2 people in the room (and I of course will disclaim any prior knowledge about birthdays) that have the same birthday. It would be even better if I could get odds, let’s say 2 to 1, but I would take even money. What are the odds?

The answer to this question is a classic probability problem called the birthday paradox, though it’s not really a paradox, it’s more of a puzzle since the answer seems to be so far removed from what intuition tells us. And the answer is that if there are 30 people in the room, the probability that 2 of them have the same birthday is 70.6%, a better than even-money bet.

How can this be? Let’s frame the problem the following way. If there are two people in the room, then the first person’s birthday can fall on any of 365 days (we are going to ignore leap years). For the second person’s birthday to not be the same day it can only fall on any of 364 days. If we add a third person, then his birthday can only fall on one of 363 days. To calculate the probability of them not having the same birthday we would use the following equation:

1 – (365/365 * 364/365 * 363/365) = 0.82%

So, in a room with 3 people in, the probability that 2 of them have the same birthday is less than 1% (0.82%).

It turns out that this is quite easy to calculate in SQL Server using common table expressions (CTE) and the XleratorDB **PRODUCT** function. First, we create a CTE that consists of the number of people and the number of unique days available and we add a select at the end to display the contents of the CTE:

;with birthdays as

(

select cast(1 as float) as people

,365 as num_bd

union all

select people + 1

,num_bd - 1

from birthdays

where people < 366

)

select *

from birthdays

where people < 4

option (maxrecursion 366)

which returns the following:

people num_bd

---------------------- -----------

1 365

2 364

3 363

(3 row(s) affected)

If we then just change the select statement, we can calculate the probability:

select 1 - wct.product(cast(num_bd as float)/365)

from birthdays

where people < 4

option (maxrecursion 366)

which returns the following result:

----------------------

0.00820416588478134

(1 row(s) affected)

It turns out that with just a minor adjustment, we can return the probability for any number of people, though we never have to do more than 366, because at that point the probability is 1. We will construct a CTE from the birthday CTE and then select the probabilities at 10, 20, 30, 40 and 50 people.

;with birthdays as

(

select cast(1 as float) as people

,365 as num_bd

union all

select people + 1

,num_bd - 1

from birthdays

where people < 366

), bpd as

(

select b.people

,(1 - wct.product(cast(a.num_bd as float)/365)) as prob

from birthdays a, birthdays b

where a.people between 1 and b.people

group by b.people

)

select people, prob

from bpd

where people in (10,20,30,40,50)

option (maxrecursion 366)

which returns the following result:

people prob

---------------------- ----------------------

10 0.116948177711078

20 0.41143838358058

30 0.706316242719269

40 0.891231809817949

50 0.970373579577988

Now, we can ask ourselves, at what point does this become an even-money bet? In other words, at what point does the probability become greater than .5? We can find this simply by changing our select statement to find the smallest number of people where the probability is > .5. Using the same CTE:

select min(people) from bpd

where prob > .50

option (maxrecursion 366)

which returns the following result:

----------------------

23

(1 row(s) affected)

At 23 people it becomes an even-money bet. Obviously we could simply test for .99 instead of .50 to see how many people would need to be in the room for the probability to be greater than 99%:

select min(people) from bpd

where prob > .99

option (maxrecursion 366)

which returns the following result:

----------------------

57

(1 row(s) affected)

Finally, I have included this select to show all the points of interest we have talked about and their probabilities:

select people, prob

from bpd

where people in (3, 10,20,27,30,40,50,57)

option (maxrecursion 366)

which returns the following result:

people prob

---------------------- ----------------------

3 0.00820416588478134

10 0.116948177711078

20 0.41143838358058

27 0.626859282263242

30 0.706316242719269

40 0.891231809817949

50 0.970373579577988

57 0.99012245934117

(8 row(s) affected)

The next time that you are at a gathering and people discover that they have the same birthday, and they say ‘what a coincidence!’ you will know better. It wasn’t fate; it was math.

Happy Thanksgiving!

Archive

Monthly

Go

| |||||||||

Sun | Mon | Tue | Wed | Thu | Fri | Sat | |||
---|---|---|---|---|---|---|---|---|---|

26 | 27 | 28 | 1 | 2 | 3 | 4 | |||

5 | 6 | 7 | 8 | 9 | 10 | 11 | |||

12 | 13 | 14 | 15 | 16 | 17 | 18 | |||

19 | 20 | 21 | 22 | 23 | 24 | 25 | |||

26 | 27 | 28 | 29 | 30 | 31 | 1 | |||

2 | 3 | 4 | 5 | 6 | 7 | 8 |

Go