5
$\begingroup$

If I have $m$ distinct boxes, and $n$ distinct balls. I put all of these balls into the boxes with one box possibly containing more than one balls. What is the expected maximum number of balls in one box?

Appreciate your thoughts on solving this problem.

  • 0
    That's a perfect problem of discrete probability, an area which tends to have a combinatorial flavor.2011-02-13

1 Answers 1

4

This is answered in a paper by Raab and Steger (they switch your $n$ and $m$). The case $n=m$ is simpler and had been known before (see their introduction).

Intuitively, in order to find the "correct" answer, you calculate the expected number of bins $X_t$ which get at least $t$ balls; the value of $t$ such that $X_t \approx 1$ should be "close" to the expectation.

In order to make this rigorous, follow these steps:

  1. Find a critical value $t_0$ such that if $t \ll t_0$ then the probability that a given box gets at least $t$ balls is very close to $1$, and if $t \gg t_0$ then it is very close to $0$.
  2. When $t \gg t_0$, a union bound shows that with high probability no box gets at least $t$ balls.
  3. When $t \ll t_0$, the expected number of boxes with at least $t$ balls is very close to $m$, and so the probability that no box gets at least $t$ balls is very small.
  4. Deduce that most of the probability mass of the expectation is concentrated around $t_0$, and so the expectation itself is close to $t_0$.

When doing the calculations, we hit a problem in step 3: the number of expected boxes with at least $t$ balls is not $m$ but somewhat smaller. Raab and Steger show that the variable $X_t$ is concentrated around its expectation (using the "second moment method", i.e. Chebyshev's inequality), and so $\Pr[X_t = 0]$ is indeed small.

Most of the work is estimating the binomial distribution of the number of balls in each box, finding the correct $t_0$ for each case; the method fails when the number of balls is significantly smaller than the number of bins.


Here is some Sage code:

def analyze_partition(partition, bins):     last = partition[0]     counts = [1]     for part in partition[1:]:         if part != last:             last = part             counts += [1]         else:             counts[-1] += 1     counts.append(bins - sum(counts))     return multinomial(*partition) * multinomial(*counts)  def expected_max(balls, bins):     return sum([max(partition) * analyze_partition(partition, bins)                 for partition in partitions(balls)                 if len(partition) <= bins])/bins^balls 

When plugging $10$ balls and $5$ bins, I get $1467026/390625 \approx 3.76.$

  • 0
    Yes, now it looks even better... In my code I just implemented the usual formula for expectation.2011-02-14