3
$\begingroup$

We have a set of numbers, of size $m$. We are going to pick $a$ numbers with uniform probability from that set, with replacement. Let X be the random variable denoting the probability of having X of those picks distinct (exactly X distinct values are picked).

Motivation: I need to calculate this probability in order to calculate a more advanced distribution regarding Bloom filters, in particular the distribution of the number of bits set to 1 in a Bloom filter.

Letting that aside, I am having trouble formulating the the PMF for X. I've tried to look out for multi-variate binomial distribution but I couldn't relate it to what I want to do.

The question is whether there is such a probability distribution in the literature, and if now, how can I approach this problem ?

Thanks.

Update:

I have managed to make a formulation: the probability we pick $x$ distinct values is $ \frac{1}{m} \frac{1}{m-1} \cdots \frac{1}{m-x+1} $

And the probability of picking the rest of our $a-x$ picks in that set of $x$ values is $ \left(\frac{x}{m}\right)^{a-x} $

Finally, the number of such configurations is $\binom{m}{x}$. Multiplying all that together and simplifying gives us a PMF

$ P(X=x;a,m) = \frac{ \left( \frac{m}{x} \right) ^{x-a}}{x!} $

Does that seem to make any sense ?

  • 0
    Alaggan: The number of combinations possible when $k$ balls are selected from a box with $n$ distinguishable balls (with replacement) is $\frac{(n+k-1)!}{(n-1)!k!}$.2011-01-12

4 Answers 4

1

Look up multiset coefficients in Wikipedia's Multiset. Your calculation would actually give $\frac{m!}{x!}\left(\frac{x}{m}\right)^{(a-x)}\binom {m}{x}$ the way you are arguing but the approach is not correct and this will not sum to 1. You are not counting the correct number of ways to get x different items.

  • 0
    Yes, but if you pick a elements and ask the X be distinct, you have a multiset. You allow that certain members are repeated. This section shows how to count the number of ways to have exactly X distinct when picking a.2011-01-12
1

The number of multisets from a set of size $m$ with cardinality $a$ is $\binom{m+a-1}{a}$.

  • 0
    Great! That is an approach worth trying, thanks :)2011-01-12
0

The probability of picking (with replacement) $a$ distinct items from a set of size $m$ is

$ \frac{m}{m} \frac{m-1}{m} ... \frac{m-a+1}{m} = \frac{(m)_a}{m^a} $

A more wordy version is: There are $a$ terms in the above. Each term represents one pick. Your first pick you can pick any of $m$ items from the set, so the probability you pick $1$ distinct item, from $m$ distinct items is $\frac{m}{m}$. Second term comes from the fact that on your second pick, you have $m-1$ options that don't duplicate the first pick. These are independent events, so you can multiply the probabilities together. Rinse and repeat for a total of $a$ times, and you have the result.

  • 0
    Hey, thanks for the answer. Would you kindly provide some explanation of how you reached this conclusion?2013-06-26
0

With $0 \le x \le a$ and $x \le m$, the probability mass function is given by $P(X=x) = S_2[a,x] \dfrac{m!}{m^a (m-x)!}$ where $S_2[a,x]$ is a Stirling number of the second kind.

Your expression counts things which are not equally probable.

You can find other research on the subject searching for "Stirling numbers of the second kind" together with either "occupancy" or "coupon collector".