Say we sample $x$ integers uniformly with replacement from $\{1,\dots,n\}$. I want to know the probability that the sample contains two arbitrary values from $\{1,\dots,n\}$. Say for concreteness the values $1$ and $2$. The cdf for the probability of finding both values in a sample of size $x$ is
$$1-2\left(\frac{n-1}{n}\right)^x + \left(\frac{n-2}{n}\right)^x$$
What is the pmf, that is the probability that the second of the two integers is found after exactly $x$ samples and what can I use that will let me plot it reasonably accurately? I will need some approximation I assume as $n$ will be at least $10000$ in my case.
Update: What is the expect value of the variable $X$ which is the number of samples until both items are found?
Update 2: The answer is given below. However python can't cope with it it seems. The following code has yet to terminate on my home PC.
#!/usr/bin/python
from __future__ import division
import matplotlib.pyplot as plt
def pmf(n, x):
return 2*((n-1)**(x-1) - (n-2)**(x-1))/n**x
n = 5000
pmfgraph = [pmf(n,x) for x in xrange(5*n)]
plt.plot(pmggraph)
plt.show()
You have to use the other formulation to get it to work.
return (2/n)*(((n-1)/n)**(x-1) - ((n-2)/n)**(x-1))
