10
$\begingroup$

I've got a balls and bins problem. Suppose I throw $n$ balls uniformly at random into $n$ bins. What is the probability that exactly $k$ bins end up with exactly $1$ ball?

I know this seems a classical problem and may look "simple" or "naive," but I've worked days on it and still can't get the answer.

However, I think I do have a good approximation for it. Namely, let $X$ denote the number of such bins. Then

$$ Pr(X=k) \approx \binom{n}{k}\left(\frac{1}{e}\right)^{k}\left(1-\frac{1}{e}\right)^{n-k} $$

where $1/e$ is an approximation for $(1-1/n)^{n-1}$.

This approximation works great when $n$ is big and poorly when $n$ is small (like $n<5$).

Anyway, I'm looking for an exact expression. Anyone have an idea?

PS: I've written a simple simulation in C++; you can check your answer with it first: Simulation Code Here.

  • 3
    The number of balls in each bin is binomially distributed with parameters $n$ and $1/n$. If we assume that the numbers of balls in each bin are independent, we get your approximation.2011-11-03
  • 0
    @MichaelLugo You are correct! I forgot to mention that :)2011-11-03
  • 1
    @MichaelLugo: It can also be seen as a 'poissonization' aproach. Let's assume $n$ independent Poissons with mean 1; this model, conditioned on the event that the sum equals its expected value ($n$) is exactly equivalent to the original.2011-11-03
  • 0
    @leonbloy Good point! This reminds me of the Poisson Approximation section of the [Probability and Computing](http://www.amazon.com/dp/0521835402) book. However, there's still one point confuse me. Is the Poisson approach exactly equal to the original case or it is at most offset by a $e \sqrt{n}$ factor. If they approximation can be exact equal to the original, then the observation that this approximation is poor when $n$ is small is due to the reason that Poisson distribution can not accurately simulate the binomial distribution when $n$ is small, right?2011-11-04
  • 0
    @leonbloy I just read the Poisson Approximation section again. It said that in order to ensure the Poisson case is exactly equal to the original case, we need to condition on that the sum of these Poisson r.v. is $n$. However, in my approximation, I only ensured that the expectation of the sum of these Poisson r.v. is $n$. But on the other hand, the approximation is so close when $n$ is big. Hmm, I'm really confused... Any idea?2011-11-04
  • 1
    @vistb: The Poisson model is not exactly equal to the true model because the condition (sum of the iid poissons equals $n$) does not hold, the sum is random, not constant. But, because of the law of large numbers, when $n$ is large the value of the sum will converge to the expected value, hence the conditioning is 'almost' true.2011-11-04
  • 0
    @leonbloy OK, I see, thank you for your help :)2011-11-04

3 Answers 3

1

I am not sure about what I'm saying, but perhaps this could turn out to be true :

Count the number of possibilites that gives rise to $X \ge k$. You have to choose which bins get $1$ ball, and then you have to choose which balls get in those bins. This gives you $\begin{pmatrix} n \\ k \end{pmatrix}\begin{pmatrix} n \\ k \end{pmatrix}$ possibilites. EDIT because of a good comment =) : You also have to choose here which ball goes in which bin, hence an extra factor of $k!$.

Now the remaining $n-k$ balls go anywhere in the $n-k$ bins left, so that gives you an extra factor of $(n-k)^{n-k}$. The total number of possibilites is just $n^n$, hence $$ \mathbb P(X\ge k) = \frac{ \begin{pmatrix} n \\ k \end{pmatrix}\begin{pmatrix} n \\ k \end{pmatrix} k! (n-k)^{n-k} }{n^n}. $$ To get $\mathbb P(X=k)$, just compute $\mathbb P(X \ge k) - \mathbb P(X \ge {k+1})$.

I don't know if our formulas are asymptotically equivalent (yours and mine), but perhaps if you're more interested in this question than I am, you could try working it out. =)

Hope that helps,

P.S. After reading the comments my argument feels wrong, but I'm still going to leave it there for readers.

  • 1
    How do you ensure that the $n-k$ balls don't end up with just one of them in a bin?2011-11-03
  • 1
    Once you've chosen balls and bins you also have to choose which ball goes into which bin. That's another factor of $k!$. And Michael Lugo's comment applies, too; I think your approach here leads to some double-counting.2011-11-03
  • 0
    Probably, and that's why I said "I am not sure about what I'm saying, but perhaps this could turn out to be true" =) But I don't know how I could remove double counting with such an argument, it seems pretty hard to do, because it is true that I might end up with two arrangements produced by two different methods.2011-11-03
  • 0
    Perhaps counting $\mathbb P(X \ge k, X_i = 1)$ where $X_i$ would be the variable "number of balls in the $i^{th}$ bin" would maybe lead to a computation that doesn't double count, but I'll maybe try that later.2011-11-03
  • 0
    I think to fix your argument you would need to apply inclusion-exclusion along the lines of robjohn's answer. His $N_j$ is the combinatorial version of your $P(X \geq k)$.2011-11-04
9

Exact formula can be obtained using de Moivre's formula for occurrence of exactly $k$ exchangeable events:

$$ p_n(k) = \binom{n}{k} \sum_{i=k}^n (-1)^{i-k} \binom{n-k}{i-k} \mathbb{P}(A_1 \ldots A_i) $$

Here $A_1$ is the event that the first bin contains $1$ ball, $A_1 A_2$ is the event that first two bins each contain 1 ball and so on.

$$ \begin{eqnarray} \mathbb{P}(A_1) &=& \binom{n}{1} \frac{1}{n} \left( 1 - \frac{1}{n} \right)^{n-1} = \left( 1 - \frac{1}{n} \right)^{n-1} \\ \mathbb{P}(A_1 A_2) &=& \binom{n}{1,1,n-2} \frac{1}{n^2} \left( 1- \frac{2}{n} \right)^{n-2} = \frac{n-1}{n} \left( 1- \frac{2}{n} \right)^{n-2} \\ \mathbb{P}(A_1 \ldots A_i) &=& \binom{n}{i} i! \frac{1}{n^i} \left( 1 - \frac{i}{n} \right)^{n-i} \end{eqnarray} $$

Hence the exact result you seek is: $$ p_n(k) = \binom{n}{k} \sum_{i=k}^n (-1)^{i-k} \binom{n-k}{i-k} \binom{n}{i} i! \frac{1}{n^i} \left( 1 - \frac{i}{n} \right)^{n-i} $$

This agrees with the simulations.enter image description here

  • 0
    +1: *Very* nice! A minor correction, though, and a comment: 1) $A_1$ is the event that the *first* bin contains one ball, $A_1 A_2$ is the event that bins *1 and 2* each contain one ball, and so forth. 2) $P(A_1, \cdots, A_i)$ might be more simply expressed as $\binom{n}{i} i! (n-i)^{n-i}/n^n$; i.e., number of ways to pick $i$ balls to go into bins $1, \ldots, i$ times number of ways to distribute exactly 1 of those balls to each bin times number of ways to distribute $n-i$ balls to the remaining $n-i$ bins, divided by the total number of ways to distribute $n$ balls to $n$ bins.2011-11-03
  • 0
    @MikeSpivey Thanks for the comment. I was looking at $\mathbb{P}(A_1 \ldots A_i)$ as a multinomial probability.2011-11-03
  • 0
    @Sasha Your answer looks interesting. However, I have never heard of the notion of _exchangeable events_ or _de Moivre's formula_, can you please provide more materials on that. I've also found the book you linked in our library, will that book help me?2011-11-03
  • 0
    @vistb Yes, I learned about it from the highly recommended book "Problems and snapshots from the world of probability" I linked to. Exchangeable events indicate that $\mathbb{P}(A_1) = \mathbb{P}(A_k)$, i.e. the probability that 1-st bin contains exactly 1 ball is equal to the probability that $k$-th bin contains exactly one ball. deMoivre's formula is related to inclusion-exclusion principle. See post of robjohn2011-11-03
  • 0
    @MikeSpivey Yes, you are right! I have edited description of $A_k$ in the post.2011-11-03
  • 0
    Have you tried writing down the two-variable generating function $\sum_{n=0}^\infty \sum_{k=0}^n p_n(k) x^n y^k$ to see whether there's a simpler expression for $p_n(k)$? (Possibly the right generating function has factors of $1/n!$ and/or $1/k!$ to make it simpler.)2011-11-04
  • 0
    @Sasha I think I'm going to choose robjohn's answer as the final answer. It uses simpler ideas (at least for me). But yours is also very nice. Anyway, after all, I can only pick one, sorry for this :( Hope you don't blame me...2011-11-04
  • 0
    @vistb Of course, you should choose the answer that helped _you_ most.2011-11-04
6

Let $S_i$ be the set of outcomes with $1$ ball in bin $i$. Let $N_j$ be the number of outcomes in the intersections of $j$ of the $S_i$; e.g. $N_3=\sum_{iGeneralized Inclusion-Exclusion Principle: $$ \begin{align} \frac{1}{n^n}\sum_{j=k}^n(-1)^{j-k}\binom{j}{k}N_j &=\frac{1}{n^n}\sum_{j=k}^n(-1)^{j-k}\binom{j}{k}\binom{n}{j}\binom{n}{j}j!(n-j)^{n-j}\tag{1}\\ &=\frac{1}{n^n}\sum_{j=k}^n(-1)^{j-k}\binom{n}{k}\binom{n-k}{n-j}\frac{n!}{(n-j)!}(n-j)^{n-j}\\ &=\binom{n}{k}\frac{n!}{n^n}\sum_{j=k}^n(-1)^{j-k}\binom{n-k}{n-j}\frac{(n-j)^{n-j}}{(n-j)!}\\ &=\binom{n}{k}\frac{n!}{n^n}\sum_{j=0}^{n-k}(-1)^{n-k-j}\binom{n-k}{j}\frac{j^j}{j!} \end{align} $$ Appendix:

To verify that the probabilities for $k=0,1,\dots,n$ sum to $1$, $(1)$ can be summed fairly easily in $k$: $$ \begin{align} &\sum_{k=0}^n\frac{1}{n^n}\sum_{j=k}^n(-1)^{j-k}\binom{j}{k}\binom{n}{j}\binom{n}{j}j!(n-j)^{n-j}\\ &=\frac{1}{n^n}\sum_{j=0}^n\sum_{k=0}^j(-1)^{j-k}\binom{j}{k}\binom{n}{j}\binom{n}{j}j!(n-j)^{n-j}\\ &=\frac{1}{n^n}\sum_{j=0}^n(-1)^{j}0^j\binom{n}{j}\binom{n}{j}j!(n-j)^{n-j}\\ &=\frac{1}{n^n}(-1)^00^0\binom{n}{0}\binom{n}{0}0!(n-0)^{n-0}\\ &=1 \end{align} $$ Mathematica:

Here is the plot for $80$ bins, which matches Sasha's plot:

Plot for 80 bins

  • 0
    It appears that my Generalized Inclusion-Exclusion Principle is similar to, if not the same as, de Moivre's formula. No banner appeared when Sasha answered, so I continued checking numerical examples to make sure my answer was correct.2011-11-03
  • 0
    Yes, they are alternative ways of writing the same probability. The proof begins with the *absorption identity* ${j\choose k}{n\choose j}={n\choose k}{n-k\choose j-k}$.2011-11-03
  • 0
    Hmm, your answer seems simpler (in terms of the concepts involved) :) But what do you mean by _intersections_? Some more details on this concept please?2011-11-03
  • 0
    @vistb: perhaps I should have said "the number of outcomes in the intesections of $j$ of the $S_i$" that is, $N_1=\sum_{i}|S_i|$ and $N_2=\sum_{i2011-11-03
  • 0
    Have you tried writing down the two-variable generating function $\sum_{n=0}^\infty \sum_{k=0}^n p_n(k) x^n y^k$ to see whether there's a simpler expression for $p_n(k)$? (Possibly the right generating function has factors of $1/n!$ and/or $1/k!$ to make it simpler.)2011-11-04