Is there a simulation for the Birthday Paradox problem? Something that uses data from Facebook would be ideal.
Is there a simulation for the Birthday Paradox?
1
$\begingroup$
probability
math-software
simulation
-
4Like [this](http://apps.facebook.com/thebirthdayparadox/)? – 2011-07-30
-
0Since birthdays are (probably) not uniformly distributed, there will actually be more "collisions" than in the uniform case. – 2011-07-30
-
1@JM. That looks like it almost completely answers the question. You might as well put it as an answer. – 2011-07-31
-
0write one (assuming uniform distribution...) – 2011-07-31
-
1@Willie: Except that it is completely inaccessible for people that (gasp!) don't use Facebook. :) – 2011-07-31
-
5@cardinal: ...and I'm a member of that group. :D (P.S. Dear Google, if you're reading this, thanks bunches for helping me answer at least half this site's questions. Hugs, J.M.) – 2011-07-31
-
1@J.M., As am I, so how did you view it? As for Google, I'm pretty sure it's reading this. :) – 2011-07-31
-
0@cardinal: It popped up somewhere around the second page when I typed `"birthday paradox" facebook` into the search box. – 2011-07-31
2 Answers
4
As someone in the comments mentioned, there are some of us weirdos that are not on Facebook. For all of the non-Facebook freaks, this one is for you(me):
There is data: http://www.panix.com/~murphy/bday.html purporting to show that birthdates are not uniformly-distributed. Specifically, they show data for n=480,040 birthdates that failed a $\chi^2$ =$\frac{(observed-expected)^2}{expected}$, where the expected number is $\frac{480,040}{365.25}$ , at a 95% level of confidence
If you have acces to JSTOR, this link should help describe how to address the birthday problem under more realistic assumptions on the distribution of birthdates : http://www.jstor.org/pss/2685309
-
2Interesting! I suspect that U.S. data for any recent *single* year would show even greater non-uniformity, because of the high Caesarian rate, and the fact that doctors prefer to golf on weekends. – 2011-07-31
-
0I haven't been able to find any more recent data, tho I suspect you're right. I may have to open a (fake) account in Facebook to check it out. It would be interesting to find specific data on those two factors, too, though. – 2011-07-31
1
Here is a pythonic answer to your request.
from sys import argv
import random
def hasAMatch(sample):
q = set(sample)
return len(q) < len(sample)
n = int(argv[1])
repetitions = int(argv[2])
days = range(365)
matches = 0
for k in range(repetitions):
sample = []
for k in range(n):
sample.append(random.choice(days))
matches += int(hasAMatch(sample))
print "In a simulation with %d repetitions, group of randomly chosen" % repetitions
print "sample of %s people, we had a matching birthday %s%% of the time." % (n, float(matches)*100/repetitions)