1
$\begingroup$

Say I take $N$ samples (with replacement) from an urn with $M$ balls each a distinct color.

I note the result, then repeat the process once again, and note the second result.

I'm interested in the probability that the two samples are the same (that is, both had same number of each of the colors present in the sample.)

This is obviously the sum of the squares of the PMF of the required multinomial distribution over the weak compositions of $N$ of length $M$, and since the balls are equiprobable, this simplifies to a more efficient scheme by doing the same over the integer partitions of $N$ with appropriate multiplications.

For large $N$ this quickly becomes intractable.

Is there a better way to calculate the desired result, or a reasonably accurate estimator for cases where $N>>M$?

  • 0
    I would have thought for $N\gg M\gg 0$ you might be able to approximate by (wrongly) assuming that having the same number of reds was independent of having the same number of blues, etc. So perhaps something like $\displaystyle \left( \sum_{n=1}^N \left({N \choose n} \dfrac{(M-1)^{N-n}}{M^N} \right)^2\right)^M$2017-01-05

0 Answers 0