8
$\begingroup$

Suppose you throw balls one-by-one into $b$ buckets, uniformly at random. At what time does the size of some (any) bucket exceed size $s$?


That is, consider the following random process. At each of times $t=1, 2, 3, \dots$,

  • Pick up a ball (from some infinite supply of balls that you have).
  • Assign it to one of $b$ buckets, uniformly at random, and independent of choices made for previous balls.

For this random process, let $T = T(s,b)$ be the time such that

  • At time $T-1$ (after the $T-1$th ball was assigned), for each bucket, the number of balls assigned to it was $\le s$.
  • At time $T$ (after the $T$th ball was assigned), there is some bucket for which the number of balls assigned to it is $s + 1$.

What can we say about $T$? If we can get the distribution of $T(s,b)$ that would be great, else even knowing its expected value and variance, or even just expected value, would be good.

Beyond the obvious fact that $T \le bs+1$ (and therefore $E[T]$ exists), I don't see anything very helpful. The motivation comes from a real-life computer application involving hashing (the numbers of interest are something like $b = 10000$ and $s = 64$).

  • 0
    I don't mind if you add my comments to the answer.2012-07-20

1 Answers 1

2

I just wrote some code to find the rough answer (for my particular numbers) by simulation.

$ gcc -lm balls-bins.c -o balls-bins && ./balls-bins 10000 64 ... Mean: 384815.56 Standard deviation: 16893.75 (after 25000 trials) 

This (384xxx) is within 2% of the number ~377xxx, specifically

$ T \approx b \left( (s + \log b) \pm \sqrt{(s + \log b)^2 - s^2} \right) $

that comes from the asymptotic results (see comments on the question), and I must say I am pleasantly surprised.

I plan to edit this answer later to summarise the result from the paper, unless someone gets to it first. (Feel free!)

  • 0
    p. 116 of _Analytic Combinatorics_ has exact expression, and some history.2013-12-13