12
$\begingroup$

There are $N$ buckets.

Each second we add one new ball to a random bucket - so at $t=k$, there are a total of $k$ balls collectively in the buckets.

At $t=1$, we expect that at least one bucket contains one ball.

At $t=\sqrt{2N\ln{2}}$, due to birthday paradox, we expect that at least one bucket contains two balls.

.

.

At $t=f(m)$, we expect at least one bucket to contain $m$ balls.

What is the function $f(m)$?

  • 1
    *Someone* wrote a [blog post](http://calculus7.org/2012/01/16/generalized-birthday-problem/) about this.2012-07-16
  • 0
    The birthday paradox is closer to $\sqrt{N\sqrt 2}$. The classic one is $23\approx \sqrt {365 \sqrt 2}$ A naive approach says you have $\frac {N(N-1)}2$ pairs, so when this is of the order of $1$ you should expect a match.2012-08-25
  • 0
    This is called the "balls in bins" problem, and there is huge amount known about it, because it is crucial to understanding the performance of computer hashing algorithms. You might try a web search for "balls in bins" and see what you find out. Or you might enjoy [this](http://ls2-www.cs.uni-dortmund.de/lehre/winter200203/randalg/balls-and-bins.pdf).2012-08-25
  • 0
    This is not $\sqrt{2N\ln(2)}$ to obtain a 50% chance ?2012-08-29
  • 0
    Actually, the probability $p$ that there bucket with more than one ball occurs is first matched or exceeded at $t=\sqrt {2N\ln \frac 1{1-p}}$.2012-08-25

1 Answers 1

2

Firstly, for $m = 2$, the expected time $f(2)$ at which at least one bucket contains two balls is not $t = \sqrt{2N\ln 2}$. That is the time $t$ at which the probability that there is at least one bucket with two balls crosses $\frac12$. The actual expected time $t$ at which at least one bucket contains two balls is instead given by $$\begin{align} 1 + Q(N) &= 1 + 1 + \frac{N-1}{N} + \frac{(N-1)(N-2)}{N^2} + \cdots + \frac{(N-1)(N-2) \cdots 1}{N^{N-1}} \\ &\sim \sqrt{\frac{\pi N}{2}} + \frac{1}{3}+\frac{1}{12}\sqrt{\frac{\pi}{2N}}-\frac{4}{135N}+\cdots \end{align} $$ where the $\sim$ denotes that these are the precise asymptotics.


(Mildly related: this question and the methods of asymptotic analysis at this one.)

For $m = 3$, see this question, where it is shown that the answer is $$ f(3) = \int_0^\infty \left(1+{x\over N}+{x^2\over2N^2}\right)^N \,e^{-x}\,dx $$ which has asymptotics $$ f(3) \sim 6^{1/3}\,\Gamma(4/3)\, N^{2/3} \approx 1.6226\,N^{2/3}. $$


For general $m$, the above question also says that the above answer generalizes to

$$f(m) \sim \sqrt[m]{m!}\ \Gamma(1 + 1/m)\ N^{1-1/m}$$

(for fixed $m$ and asymptotically as $N \to \infty$).