2
$\begingroup$

Suppose we choose $n$ samples from a set of $N$ elements with replacement. What is the probability of getting $r$ or more (different) elements sampled more than once?

(I already know that for $r=1$ this is the birthday problem, and about calculating the probability for samples, not elements: http://mathforum.org/library/drmath/view/62941.html.)

1 Answers 1

2

For $1\leq i\leq N$, define $A_i$ to be the event that value $i$ occurs only zero times or one time. From the multinomial distribution we get $\mathbb{P}(A_1A_2\cdots A_j)=\sum_{s=0}^j{j\choose s}{n\choose s}s!\left({1\over N}\right)^s\left(1-{j\over N}\right)^{n-s}.$

Then the inclusion-exclusion formula gives $\mathbb{P}(\mbox{at least }k\mbox{ of the }A_i\mbox{s occur})=\sum_{j=k}^N(-1)^{j-k}{j-1\choose j-k}{N\choose j}\mathbb{P}(A_1A_2\cdots A_j).$

You want to substitute $k=N-r+1$ above, and subtract from 1. That is, $\mathbb{P}(\mbox{at least }r\mbox{ repeated values})=1-\mathbb{P}(\mbox{at least }N-r+1\mbox{ of the }A_i\mbox{s occur}).$