2
$\begingroup$

We throw balls into the bins until no bin is empty. What is the expected time until no bin is empty?

The solution goes: Let $Y$ be the random variable that counts the time until no bin remains empty. We write $Y$ = Sum of $Y_i$.

$Y_i$ is the time since we had $(i-1)$ nonempty bins, until we have $i$ nonempty bins.

Usually I would think about this as $k$ independent balls into $n$ bins and the sample space is clear here.

The $Y$ random variable is very intuitive, but how does one formalize it? What is the sample space? And how can we justify the passage from the $k$ balls into $n$ bins sample space into the one I'm asking about?

2 Answers 2

4

I am a bit confused as to how the $k$ ball limit is imposed here. In the solution below, I assumed that there were an unlimited supply of balls.

After the $(j{-}1)^{st}$ bin gets a ball, the expected time before the $j^{th}$ bin has a ball is $\frac{n}{n-j+1}$, thus the expected time to fill all $n$ bins would be $ n\sum_{j=1}^n\frac{1}{n-j+1}=n\sum_{j=1}^n\frac{1}{j} $


The argument above uses the Linearity of Expectation. The part that seems a bit magical upon first sight is that $\mathrm{E}[X+Y\;]=\mathrm{E}[X]+\mathrm{E}[Y\;]$ even when $X$ and $Y$ are not independent.

Below is an argument that computes all the probabilities and yields the same answer.

The Generalized Principle of Inclusion-Exclusion says

Let $\{S_i\}$ be a finite collection of sets from a finite universe. Let $N_j$ be the the sum of the sizes of all intersections of $j$ of the $S_i$; thus, $N_0$ is the size of the universe. Then, the number of elements in exactly $k$ of the $S_i$ is $ \sum_{j\ge k}(-1)^{j-k}\binom{j}{k}N_j $ Suppose we've thrown $k$ balls. If we let $S_i$ be the arrangements where bin $i$ is empty, then we have $ N_j=\binom{n}{j}(n-j)^k $ that is, $\binom{n}{j}$ ways to choose the $j$ empty bins and $(n-j)^k$ ways to place the $k$ balls into the remaining $n-j$ bins.

Since there are $n^k$ ways to throw $k$ balls into $n$ bins, the probability of getting exactly $1$ bin empty after throwing $k$ balls is $ \frac{1}{n^k}\sum_{j\ge1}(-1)^{j-1}\binom{j}{1}\binom{n}{j}(n-j)^k $ The probability of getting the last bin with the next ball is $\frac{1}{n}$ and that would be after $k+1$ balls. Thus, the expected number of balls would be $ \begin{align} &\frac{1}{n}\sum_{k\ge0}\sum_{j\ge1}(-1)^{j-1}\binom{j}{1}\binom{n}{j}(k+1)(1-j/n)^k\\ &=\frac{1}{n}\sum_{j\ge1}(-1)^{j-1}\binom{j}{1}\binom{n}{j}\left(\frac{n}{j}\right)^2\\ &=\sum_{j\ge1}(-1)^{j-1}\binom{n}{j}\frac{n}{j}\\ &=\sum_{j\ge1}(-1)^{j-1}\sum_{k=0}^n\binom{n-k-1}{j-1}\binom{k}{0}\frac{n}{j}\\ &=\sum_{j\ge1}(-1)^{j-1}\sum_{k=0}^n\binom{n-k}{j}\frac{n}{n-k}\\ &=\sum_{k=0}^{n-1}\frac{n}{n-k}\\ &=n\sum_{k=1}^n\frac{1}{k} \end{align} $

1

The theoretical approach is to consider a sample space that consists of all infinite sequences of ball throws. For each such sequence, $Y$ is then the length of the shortest prefix that hits all bins. There are some sequences for this this is not well defined because some bin will never be hit, but the probability of having any of those sequences is 0 (that's the Infinite Monkey Theorem), so we can just ignore them, or let $Y$ have some dummy value in those cases.

Because the sample space is infinite, the combinatorial principle of counting happy outcomes and dividing by total outcomes won't work here; instead we need to move to the more abstract framework of probability measures. However, with some cleverness one can get a lot of mileage out of the assumption that successive throws are independent, even without worrying too much about the theoretical basis. See the coupon-collector's problem for derivations in this particular case.