3
$\begingroup$

Let's say that I have a set of unique elements, $P$, and a multiset $M$ that I fill with $N \leq ||P||$ elements by sampling with replacement from $P$. What is the probability that the multiset $M$ contains $0 \leq k \leq N$ elements that occur once (i.e. elements with only a single copy in the multiset)?

The probability of a single element in a multiset of cardinality $N$ occurring only once should be equivalent to tossing $N$ balls into $||P||$ bins, and finding the probability that a particular ball is by itself in a bin.


To get started...

From pg. 95 of "Probability and computing: Randomized algorithms and probabilistic analysis" by Michael Mitzenmacher and Eli Upfal, when we toss $N$ balls into $||P||$ bins, the probability that a specific bin has exactly $r$ balls, P[$r$], is given as:

P[$r$] = ${N \choose r}$ $(\frac{1}{||P||})^r(1-\frac{1}{||P||})^{(N-r)}$

By linearity of expectation, we can now write an expression for the expected number of balls that exist in a bin of $r$ balls as: E[X] = $||P||*r*{N \choose r}$ $(\frac{1}{||P||})^r(1-\frac{1}{||P||})^{(N-r)}$

For $r = 1$, $E[X] = N*(1-\frac{1}{||P||})^{N-1}$

  • 0
    Just count: How many possibilities to choose N elements of p elements and then linearly ordering them? How many possibilities to choose N times one of p elements? And then you do division.2011-04-19

2 Answers 2

2

This is another variant on the birthday problem and the coupon collector's problem. Let's say there are $p$ elements of $P$ (possible birthdays) and a sample size with replacement of $n$ (people in the room). $k$ is then the number of people who do not share a birthday with anyone else in the room.

The expected number who do not share a birthday (i.e. $E[K]$) is not as difficult to state:
$n \left(1-\frac{1}{p}\right)^{n-1}$ while the variance is: $n \left(1-\frac{1}{p}\right)^{n-1} + n (n-1) \left(1-\frac{1}{p}\right) \left(1-\frac{2}{p}\right)^{n-2} - n^2 \left(1-\frac{1}{p}\right)^{2(n-1)}.$

These are similar in form to the expressions I gave on a similar question for number of distinct birthdays (distinct elements of $M$ in your question).

Added:

I think you can calculate the numbers using the following integer recurrence:

$f(a,b,c,d) = (d-b+1) f(a-1,b-1,c-1,d) + (b-a) f(a,b,c-1,d) + (a+1) f(a+1,b,c-1,d)$

starting at $f(a,b,0,d)=0$ except that $f(0,0,0,d)=1$. The probabilities you are looking for will then be

$Pr(K=k|n,p) = p^{-n} \sum_b f(k,b,n,p).$

As an example, looking at $n=5$ and $p=10$, you should find $f(0,1,5,10) = 10$, $f(0,2,5,10) = 900$, $f(1,2,5,10) = 450$, $f(1,3,5,10) = 10800$, $f(2,3,5,10) = 7200$, $f(3,4,5,10) = 50400$, and $f(5,5,5,10) = 30240$. This will lead to probabilities for the number of unique elements of

  • $Pr(K=0|n=5, p=10) = 0.0091$,
  • $Pr(K=1|n=5, p=10) = 0.1125$,
  • $Pr(K=2|n=5, p=10) = 0.0720$,
  • $Pr(K=3|n=5, p=10) = 0.5040$,
  • $Pr(K=4|n=5, p=10) = 0$, and
  • $Pr(K=5|n=5, p=10) = 0.3024$.

Given the lack of an obvious pattern in these probabilities as $k$ increases, I think there is unlikely to be a simple closed form in general.

  • 0
    And again our answers obtained using recurrences and inclusion-exclusion agree :-) (Don't worry, I'm not stalking you, just going through old questions that can be answered using inclusion-exclusion. :-)2016-05-18
0

This can be solved using inclusion-exclusion.

Let $m=|P|$. The probability for $j$ particular elements having been sampled exactly once is

$ \binom Njj!\left(\frac1m\right)^j\left(\frac{m-j}m\right)^{N-j}\;, $

so the probability for exactly $k$ of the elements to have been sampled exactly once is

\begin{align} &m^{-N}\sum_{j=k}^N(-1)^{j-k}\binom mj\binom jk\binom Njj!(m-j)^{N-j}\\ ={}&m^{-N}\sum_{j=k}^N(-1)^{j-k}\frac{m!N!}{(m-j)!k!(j-k)!(N-j)!}(m-j)^{N-j}\\ ={}&m^{-N}\binom Nk\sum_{j=k}^N(-1)^{j-k}\binom{N-k}{j-k}\frac{m!}{(m-j)!}(m-j)^{N-j}\;. \end{align}

This latter form can be interpreted such that we choose $k$ of the $N$ samples at which to sample $k$ unique elements, and then we have $N-k$ conditions that the remaining samples shouldn't sample unique elements; if $j-k$ particular ones of the remaining samples do sample unique elements, there are $j$ unique elements in total, which we can choose in $\frac{m!}{(m-j)!}$ ordered ways (to assign them to the $j$ samples), and the remaining $N-j$ samples can take any of the $m-j$ remaining values.

The expected value of $K$ is more easily calculated with the original form:

\begin{align} E[K] &=m^{-N}\sum_{k=0}^Nk\sum_{j=k}^N(-1)^{j-k}\binom mj\binom jk\binom Njj!(m-j)^{N-j} \\ &=m^{-N}\sum_{j=0}^N\binom mj\binom Njj!(m-j)^{N-j}\sum_{k=0}^j(-1)^{j-k}k\binom jk \\ &=m^{-N}\sum_{j=0}^N\binom mj\binom Njj!(m-j)^{N-j}\delta_{j1} \\ &=N\left(1-\frac1m\right)^{N-1}\;, \end{align}

in agreement with your result. We can also calculate

\begin{align} E\left[K^2\right] &=m^{-N}\sum_{k=0}^Nk^2\sum_{j=k}^N(-1)^{j-k}\binom mj\binom jk\binom Njj!(m-j)^{N-j} \\ &=m^{-N}\sum_{j=0}^N\binom mj\binom Njj!(m-j)^{N-j}\sum_{k=0}^j(-1)^{j-k}k^2\binom jk \\ &=m^{-N}\sum_{j=0}^N\binom mj\binom Njj!(m-j)^{N-j}(\delta_{j1}+2\delta_{j2}) \\ &=N\left(1-\frac1m\right)^{N-1}+N(N-1)\left(1-\frac1m\right)\left(1-\frac2m\right)^{N-2}\;, \end{align}

and thus the variance

\begin{align} \operatorname{Var}\left[K^2\right] &=E\left[K^2\right]-E[K]^2 \\ &=N^2\left(\left(1-\frac1m\right)\left(1-\frac2m\right)^{N-2}-\left(1-\frac1m\right)^{2N-2}\right)+N\left(\left(1-\frac1m\right)^{N-1}-\left(1-\frac1m\right)\left(1-\frac2m\right)^{N-2}\right) \\ &=\frac{2N(N-1)}m+O\left(\frac1{m^2}\right)\;, \end{align}

in agreement with Henry's result.