3
$\begingroup$

$n$ items are distributed into $n$ boxes such that each item is independently put with probability $p_j$ to be put in box $j$, for $j=1,2,\ldots,n$, where $\sum_{j=1}^np_j=1$. A collision occurs whenever an item is put into a non-empty box. Find the expected number of collisions.

What's wrong with the following reasoning? Let $C_i$ be the number of collisions in box $i$. Then $C_i+1$ is the number of items in box $i$, so $E[C_i+1]=np_j$ and $E[C_i]=np_j-1$. Therefore the expected number of collisions is $\sum_{j=1}^n(np_j-1)=0$. Obviously the number cannot be zero, because the number is sometimes positive but never negative.

2 Answers 2

2

Let $C_i$ be the number of collisions in box $i$. Then $C_i+1$ is the number of items in box $i$...

That's wrong. More precisely, that's correct only if $C_i >0$. If $C_i=0$ then the number of items in box $i$ can be 0 or 1.

  • 0
    @strake If this is correct, you might consider accepting the answer by clicking the green checkmark.2012-04-19
1

Let $X$ be the number of boxes that end up with $0$ items, and let $Y$ be the number of boxes that end up with $1$ item. Then we have $n-Y$ remaining items, to be distributed between the remaining $n-X-Y$ boxes.

The number of collisions is $n-Y$ minus the number of boxes with more than $1$ item (since the first item put in such a box does not produce a collision). So the number of collisions is $(n-Y)-(n-X-Y)$, which is simply $X$. This is intuitively obvious (though I wasn't immediately sure). Every empty box causes a collision, that's the only way collisions are created, since the number of items and the number of boxes is the same.

What is the expected number of boxes that contain $0$ items? Define the random variable $V_i$ ($V$ for void) by $V_i=1$ if box $i$ ends up empty, and by $V_i=0$ otherwise. Then the number $V$ of empty boxes is given by $V=V_1+V_2+\cdots +V_n$.

Thus, by the linearity of expectation, we have $E(V)=E(V_1+V_2+\cdots +V_n)=E(V_1)+E(V_2)+\cdots +E(V_n).$ It remains to find $E(V_i)$. This is the probability that $V_i=1$, which is $(1-p_i)^n$.