0
$\begingroup$

Say we sample $x$ integers uniformly with replacement from $\{1,\dots,n\}$. I want to know the probability that the sample contains two arbitrary values from $\{1,\dots,n\}$. Say for concreteness the values $1$ and $2$. The cdf for the probability of finding both values in a sample of size $x$ is

$1-2\left(\frac{n-1}{n}\right)^x + \left(\frac{n-2}{n}\right)^x$

What is the pmf, that is the probability that the second of the two integers is found after exactly $x$ samples and what can I use that will let me plot it reasonably accurately? I will need some approximation I assume as $n$ will be at least $10000$ in my case.

Update: What is the expect value of the variable $X$ which is the number of samples until both items are found?

Update 2: The answer is given below. However python can't cope with it it seems. The following code has yet to terminate on my home PC.

#!/usr/bin/python from __future__ import division import matplotlib.pyplot as plt def pmf(n, x):     return 2*((n-1)**(x-1) - (n-2)**(x-1))/n**x n = 5000 pmfgraph = [pmf(n,x) for x in xrange(5*n)] plt.plot(pmggraph) plt.show() 

You have to use the other formulation to get it to work.

return (2/n)*(((n-1)/n)**(x-1) - ((n-2)/n)**(x-1)) 

1 Answers 1

0

[EDITED to correct error]

Break it up into two parts: you're asking for the probability that the first $x-1$ values contain exactly one from $\{1,2\}$ and that the $x$'th value is whichever of those two was left out of the first $x-1$. The probability that the first $x-1$ contain at least one $1$ and no $2$ is
$\left(\frac{n-1}{n}\right)^{x-1} - \left(\frac{n-2}{n}\right)^{x-1} $ and the same for at least one $2$ and no $1$. The probability that the $x$'th is whichever was left out is $1/n$. So your answer is $ \frac{2}{n} \left(\left(\frac{n-1}{n}\right)^{x-1} - \left(\frac{n-2}{n}\right)^{x-1} \right) = \frac{2 \left((n-1)^{x-1} - (n-2)^{x-1}\right)}{n^x}$

Any modern plotting software should be able to handle this nicely without approximations. Here it is for $n=10000$, $x = 0$ to $50000$, in Maple 16.

enter image description here

The expected value is $3n/2$.

  • 0
    I posted a follow up to http://math.stackexchange.com/questions/262901/probability-distribution-for-finding-two-values-in-stages .2012-12-21