1
$\begingroup$

Is there a simulation for the Birthday Paradox problem? Something that uses data from Facebook would be ideal.

  • 4
    Like [this](http://apps.facebook.com/thebirthdayparadox/)?2011-07-30
  • 0
    Since birthdays are (probably) not uniformly distributed, there will actually be more "collisions" than in the uniform case.2011-07-30
  • 1
    @JM. That looks like it almost completely answers the question. You might as well put it as an answer.2011-07-31
  • 0
    write one (assuming uniform distribution...)2011-07-31
  • 1
    @Willie: Except that it is completely inaccessible for people that (gasp!) don't use Facebook. :)2011-07-31
  • 5
    @cardinal: ...and I'm a member of that group. :D (P.S. Dear Google, if you're reading this, thanks bunches for helping me answer at least half this site's questions. Hugs, J.M.)2011-07-31
  • 1
    @J.M., As am I, so how did you view it? As for Google, I'm pretty sure it's reading this. :)2011-07-31
  • 0
    @cardinal: It popped up somewhere around the second page when I typed `"birthday paradox" facebook` into the search box.2011-07-31

2 Answers 2

4

As someone in the comments mentioned, there are some of us weirdos that are not on Facebook. For all of the non-Facebook freaks, this one is for you(me):

There is data: http://www.panix.com/~murphy/bday.html purporting to show that birthdates are not uniformly-distributed. Specifically, they show data for n=480,040 birthdates that failed a $\chi^2$ =$\frac{(observed-expected)^2}{expected}$, where the expected number is $\frac{480,040}{365.25}$ , at a 95% level of confidence

If you have acces to JSTOR, this link should help describe how to address the birthday problem under more realistic assumptions on the distribution of birthdates : http://www.jstor.org/pss/2685309

  • 2
    Interesting! I suspect that U.S. data for any recent *single* year would show even greater non-uniformity, because of the high Caesarian rate, and the fact that doctors prefer to golf on weekends.2011-07-31
  • 0
    I haven't been able to find any more recent data, tho I suspect you're right. I may have to open a (fake) account in Facebook to check it out. It would be interesting to find specific data on those two factors, too, though.2011-07-31
1

Here is a pythonic answer to your request.

from sys import argv
import random
def hasAMatch(sample):
    q = set(sample)
    return len(q) < len(sample)
n = int(argv[1])
repetitions = int(argv[2])
days = range(365)
matches = 0
for k in range(repetitions):
    sample = []
    for k in range(n):
        sample.append(random.choice(days))
    matches += int(hasAMatch(sample))
print "In a simulation with %d repetitions, group of randomly chosen" % repetitions
print "sample of %s people, we had a matching birthday %s%% of the time." % (n, float(matches)*100/repetitions)