14
$\begingroup$

I have a set of $n$ integers $\{1, . . . , n\}$, and I select three values with replacement. How can I find the expected number of distinct values?

Note each value is chosen uniformly and independently.

  • 0
    @Byron: I give links to some other examples of using indicator variables in my answer to that question. And the more complicated problem of the expected number of distinct items when drawing $k$ *pairs* at random [was asked](http://math.stackexchange.com/questions/10124/how-to-calculate-the-expected-number-of-distinct-items-when-drawing-pairs) a while back. Both the "with replacement" and "without replacement" answers are given.2011-10-13

3 Answers 3

13

We first answer the exact question asked, where one draws three items, then we present a more powerful approach which solves the more general case where one draws any number of items.


The probability that the first item was not chosen before is $1$. The probability that the second item was not chosen before is $\frac{n-1}n$. The probability that the third item was not chosen before is $\frac{n-2}n$ if the two first items are different and $\frac{n-1}n$ if the two first items coincide. The two first items are different with probability $\frac{n-1}n$ and they coincide with probability $\frac{1}n$. Hence the expected number of distinct items is $ 1+\frac{n-1}n+\frac{n-2}n\times\frac{n-1}n+\frac{n-1}n\times\frac1n=\frac{3n^2-3n+1}{n^2} $


More generally, consider the number $N_k$ of different items chosen after $k$ picks. Then $N_0=0$ almost surely and, knowing $N_k$ the probability to pick a new item at time $k+1$ is $(n-N_k)/n$ hence $\mathrm E(N_{k+1}\mid N_k)=N_k+\frac{n-N_k}n\qquad\text{almost surely}$ This shows that the expected number of different items after $k$ picks $\mathrm E(N_k)$ is such that $\mathrm E(N_0)=0$ and $\mathrm E(N_{k+1})=\mathrm E(N_k)+\frac{n-\mathrm E(N_k)}n=1+a_n\mathrm E(N_k)$ for every $k\geqslant0$, with $a_n=1-\frac1n$ Thus, for every $k\geqslant0$, $\mathrm E(N_k)=\frac{1-a_n^k}{1-a_n}$ or, equivalently, $ \mathrm E(N_k)=n\,\frac{n^k-(n-1)^k}{n^{k}}=\sum\limits_{i=0}^{k-1}(-1)^{i}{k\choose i+1}\frac1{n^i} $ For example, the fifth row of the Pascal triangle reads $1\quad5\quad10\quad10\quad5\quad1$ hence $E(N_5)=\frac{5n^4-10n^3+10n^2-5n+1}{n^5}$

  • 1
    @robjohn, yep, a nice recursion based on conditional expectations is simply the thing to make a probabilist happy for hours...2011-10-13
11

@Did's method uses recursion very nicely to arrive at the expected number. I arrived at the same answer using more mundane computations.

Let's generalize by picking $p$ numbers from $1\dots n$ with replacement. Let us compute the probability of choosing $d$ distinct numbers. Choose one of the $\binom{n}{d}$ sets of $d$ distinct numbers. The probability of selecting all $p$ picks from those $d$ distinct numbers is $\left(\frac{d}{n}\right)^p$. However, this also counts cases where some of the $d$ numbers were not chosen. Inclusion-exclusion says that the probability of picking all of those $d$ is $ \sum_k(-1)^{k}\binom{d}{d-k}\left(\frac{d-k}{n}\right)^p=\sum_k(-1)^{d-k}\binom{d}{k}\left(\frac{k}{n}\right)^p\tag{1} $ Thus, the probability of picking exactly $d$ distinct items is $\binom{n}{d}$ times $(1)$. The expected value is therefore $ \begin{align} &\sum_d\sum_k(-1)^{d-k}d\binom{n}{d}\binom{d}{k}\left(\frac{k}{n}\right)^p\\ &=\sum_d\sum_k(-1)^{d-k}d\binom{n}{k}\binom{n-k}{n-d}\left(\frac{k}{n}\right)^p\\ &=n-\sum_d\sum_k(-1)^{d-k}(n-d)\binom{n}{k}\binom{n-k}{n-d}\left(\frac{k}{n}\right)^p\\ &=n-\sum_d\sum_k(-1)^{d-k}(n-k)\binom{n}{k}\binom{n-k-1}{n-d-1}\left(\frac{k}{n}\right)^p\\ &=n-\sum_k(n-k)\binom{n}{k}\delta(k-n-1)\left(\frac{k}{n}\right)^p\\ &=n-n\left(\frac{n-1}{n}\right)^p\\ &=n\left(1-\left(\frac{n-1}{n}\right)^p\right)\tag{2} \end{align} $

  • 0
    @Byron: yes, indeed! The probability that all $p$ picks are not $1$ is $\left(\frac{n-1}{n}\right)^p$ so the expected value of $1$ is $1-\left(\frac{n-1}{n}\right)^p$ and so the total expected value is $n\left(1-\left(\frac{n-1}{n}\right)^p\right)$. Very neat!2011-10-13
0

I would like to give a simple reasoning for the same.

Suppose we have a SRS of size p selected from a population of size n with replacement. To find the expected no. of unique elements in the sample, we make use of the linearity of expectation.

Let X: no. of unique elements in p picks Let Xi: Indicator that i is a unique element

Then, X = ∑Xi ,

Where Xi=1, if i is selected in p picks & Xi=0, otherwise

Then,E(X) = ∑E(Xi) = ∑P(i is selected in p picks) = ∑(1-P(i is not selected in p picks)) = ∑(1-((n-1)/n)^p)= n(1−((n-1)/n)^p)

Note: Here, all the picks are independent (SRSWR)

  • 1
    Welcome to MSE. It is in your best interest that you use [MathJax](https://math.meta.stackexchange.com/questions/5020/mathjax-basic-tutorial-and-quick-reference).2018-06-05