4
$\begingroup$

If I have 5000 Facebook friends, what is the probability that a day with no one having that birthday exists? I assume there are 365 days in a year, and a uniform distribution of the dates of birth.

I guess that it's easier to calculate the opposite, and to subtract it from one.

  • 0
    The probabilities are dramatically different if you include leap years, see my answer.2011-09-15

5 Answers 5

9

Very small. Since it will be even more unlikely that there are two missed dates, we can get an excellent approximation by simply ignoring that possibility. Then there are 365 possibilities for which day is missing, and for each of those the probability of missing it is ${(\frac{364}{365})}^{5000} \approx 1/906576$. The probability of missing some day is then close to 365 times that, or about $1/2484$.

  • 4
    +1 This is the best, and an underappreciated, approach to solving anything like this if you don't need to be accurate to the 10th decimal place.2011-09-15
8

A (usually very good) approximation is based on the procedure called poissonization.

Consider that you have a random number of friends, with a Poisson distribution of parameter $F=5000$ and that there are $J=365$ days in a year. Then each day of the year receives a random number of friends, with a Poisson distribution of parameter $F/J$ and (this is the crucial feature) these are independent.

Then the probability $P$ that at least one day stays free is $1-p^J$ where $p$ is the probability that a Poisson random variable with parameter $F/J$ is at least $1$. One gets $p=1-e^{-F/J}$ and $ P=1-(1-e^{-F/J})^J. $ Numerically, this yields $P=.0410170619 \%$.

The quality of the approximation (which may be quantified) is based on two facts. First, the Poisson model conditionally on the total number of friends coincides with the original model. Second, at least when the parameter $F$ is large, the Poisson random variable is highly concentrated around its mean $F$, hence the conditioning is in fact not necessary.

  • 0
    not convinced yet, but I will think about it. BTW I was thinking in the relation between this poissonization and the use of alternative ensembles in statistical physics (microcanonical / canonical / grand-canonical), if someone is aware of some reference that explores this (direct, I think) conceptual link, I'd like to know about it.2011-09-15
5

You can use the formula that I give in this post. If you substitute $n=365$ and $k=5000$, you calculate the probability $.99959$ of getting all 365 birthdays. The probability of not getting all 365 birthdays is $1-.99959=.00041$, or about 1 in 2500.


Alternatively, note that this is the classical occupancy problem. The exact probability that at least one of the birthdays does not appear is $\sum_{j=0}^n (-1)^j {n\choose j}\left(1-{j\over n}\right)^k$ where we let $n=365$ and $k=5000$. Fortunately, with such a large $k$ we get an excellent approximation by taking only the first 3 or 4 terms in the sum.

2

You can compute it by combinatorics: considering the friends birthdays as (distinc) balls (B=5000) and the days of the years as (distinct) cells (C=365), the total number of ways of placing B balls in C cells is $C^B$. The number of ways of placing them with the restrinction that no cell is empty is given by the number of Stirling of second Kind $S(B,C)$ multiplied by $C!$. Hence, the probability that all cells are ocuppied (no empty birthdays) is

$ P = \frac{C\,! \; S(B,C)}{C^B}$

To get an asymptotics... Didier beat me :-)

2

Let's say for a moment that there are 365 equally likely birthdays and one that is only 25% as likely as the others. Then the probability that no one is born on the less-likely day is

$\left(1-\frac{1}{4\cdot365+1}\right)^{5000}\approx3.25\%$

and otherwise the probability that one is missed is close to

$365\left(1-\frac{1}{365}\right)^{5000}\approx0.04\%$

as Henning Makholm suggests. Combining the two, and being slightly more careful about excluding the leap-day birthdays from the second calculation I get a combined probability of 3.2979% that a birthday is missing, or a 96.7021% that all birthdays are present.

  • 0
    I don't know why you'd think that the change would be small adding in leap day. You're looking $f$or the chance that a day is missed, so a single rare day should dominate the calculations -- as indeed it does.2011-09-15