4
$\begingroup$

If I have 5000 Facebook friends, what is the probability that a day with no one having that birthday exists? I assume there are 365 days in a year, and a uniform distribution of the dates of birth.

I guess that it's easier to calculate the opposite, and to subtract it from one.

  • 0
    a year is 365 days, and a uniform repartition of the births2011-09-14
  • 15
    Off topic: If you have 5000 FB friends, you have bigger problems :)2011-09-14
  • 3
    @percusse: I guess the OP wants to know how big his problems are, by checking if he has at least one day each year that he doesn't have to go to a birthday party :)2011-09-14
  • 0
    I would say that the result is 1-C(5000, 365) which is basically 0.2011-09-14
  • 5
    If by $C(5000,365)$ you mean the binomial coefficient "5000 choose 365", which is a big integer, then $1-C(5000,365)$ is a big negative integer, which is neither a probability nor "basically 0".2011-09-14
  • 0
    i forgot to devide by the total amount of possibilities right. It's late2011-09-14
  • 1
    Do we have to take in account if someone's birthday is on the 29th of feburary and we are not in a leap year?2011-09-14
  • 1
    A formal expression for the probability can be written using the multinomial coefficient: $$1-\frac{1}{365^{5000}}\sum_{d}{5000\choose d_1,\cdots,d_{365}},$$ where the sum is over indices $d_i\ge1$ with $d_1+\cdots+d_{365}=5000$. (This ignores February 29th however.)2011-09-14
  • 0
    The probabilities are dramatically different if you include leap years, see my answer.2011-09-15

5 Answers 5

9

Very small. Since it will be even more unlikely that there are two missed dates, we can get an excellent approximation by simply ignoring that possibility. Then there are 365 possibilities for which day is missing, and for each of those the probability of missing it is ${(\frac{364}{365})}^{5000} \approx 1/906576$. The probability of missing some day is then close to 365 times that, or about $1/2484$.

  • 0
    why you don't find the probability of no missed dates ?2011-09-14
  • 0
    Because it's easier to just find (well, approximate) the probability of missing one of them. (Which was, after all, what was asked for).2011-09-14
  • 4
    +1 This is the best, and an underappreciated, approach to solving anything like this if you don't need to be accurate to the 10th decimal place.2011-09-15
8

A (usually very good) approximation is based on the procedure called poissonization.

Consider that you have a random number of friends, with a Poisson distribution of parameter $F=5000$ and that there are $J=365$ days in a year. Then each day of the year receives a random number of friends, with a Poisson distribution of parameter $F/J$ and (this is the crucial feature) these are independent.

Then the probability $P$ that at least one day stays free is $1-p^J$ where $p$ is the probability that a Poisson random variable with parameter $F/J$ is at least $1$. One gets $p=1-e^{-F/J}$ and $$ P=1-(1-e^{-F/J})^J. $$ Numerically, this yields $P=.0410170619 \%$.

The quality of the approximation (which may be quantified) is based on two facts. First, the Poisson model conditionally on the total number of friends coincides with the original model. Second, at least when the parameter $F$ is large, the Poisson random variable is highly concentrated around its mean $F$, hence the conditioning is in fact not necessary.

  • 0
    +1 However, the small probability we are computing here lies in a "improbable zone", hence we are not so sure that it will be a good approximation.2011-09-14
  • 1
    @leonbloy: No, you are confusing the event that one box stays empty (which is improbable), with the overall population in the Poisson model (where it is random) and in the multinomial one (where it is random). The regime of overall populations where we use the Poisson model is *typical*, not *improbable*. (Thanks for the appreciation.)2011-09-14
  • 0
    not convinced yet, but I will think about it. BTW I was thinking in the relation between this poissonization and the use of alternative ensembles in statistical physics (microcanonical / canonical / grand-canonical), if someone is aware of some reference that explores this (direct, I think) conceptual link, I'd like to know about it.2011-09-15
5

You can use the formula that I give in this post. If you substitute $n=365$ and $k=5000$, you calculate the probability $.99959$ of getting all 365 birthdays. The probability of not getting all 365 birthdays is $1-.99959=.00041$, or about 1 in 2500.


Alternatively, note that this is the classical occupancy problem. The exact probability that at least one of the birthdays does not appear is $$\sum_{j=0}^n (-1)^j {n\choose j}\left(1-{j\over n}\right)^k$$ where we let $n=365$ and $k=5000$. Fortunately, with such a large $k$ we get an excellent approximation by taking only the first 3 or 4 terms in the sum.

2

You can compute it by combinatorics: considering the friends birthdays as (distinc) balls (B=5000) and the days of the years as (distinct) cells (C=365), the total number of ways of placing B balls in C cells is $C^B$. The number of ways of placing them with the restrinction that no cell is empty is given by the number of Stirling of second Kind $S(B,C)$ multiplied by $C!$. Hence, the probability that all cells are ocuppied (no empty birthdays) is

$$ P = \frac{C\,! \; S(B,C)}{C^B}$$

To get an asymptotics... Didier beat me :-)

2

Let's say for a moment that there are 365 equally likely birthdays and one that is only 25% as likely as the others. Then the probability that no one is born on the less-likely day is

$$\left(1-\frac{1}{4\cdot365+1}\right)^{5000}\approx3.25\%$$

and otherwise the probability that one is missed is close to

$$365\left(1-\frac{1}{365}\right)^{5000}\approx0.04\%$$

as Henning Makholm suggests. Combining the two, and being slightly more careful about excluding the leap-day birthdays from the second calculation I get a combined probability of 3.2979% that a birthday is missing, or a 96.7021% that all birthdays are present.

  • 0
    In other words, you think that a consequence of the adjunction of a February 29th roughly every four years is to multiply the answer by about 80... Not sure I concur.2011-09-15
  • 0
    @Didier Piau: I ran a Monte Carlo simulation with 10,000 trials (50 million birthdays) and found 3.29%, which is in excellent agreement with my result. (Actually that's better than I'd expect by chance, but that happens some times. I'm running 10^5 trials now; I don't expect to be quite so lucky but it should be much closer to 3% than 0.05%.)2011-09-15
  • 0
    100,000 trials gives 3.293%. I'd expect it to fall in the range 3.24% to 3.36% if my figure was correct.2011-09-15
  • 0
    Simulations are an interesting complement, sure, but one expects the final result to change only marginally when one adds 1/4 of a day to a 365-days year, so to speak. The fact that yours is multiplied by 80 is a sure sign that something is wrong in your computations (not to mention the fact that others on this page proved the result to be about .04%).2011-09-15
  • 0
    I don't know why you'd think that the change would be small adding in leap day. You're looking for the chance that a day is missed, so a single rare day should dominate the calculations -- as indeed it does.2011-09-15