Say we sample $x$ integers uniformly with replacement from $\{1,\dots,n\}$. I want to know the probability that the sample contains two arbitrary values from $\{1,\dots,n\}$. Say for concreteness the values $1$ and $2$. The cdf for the probability of finding both values in a sample of size $x$ is
$1-2\left(\frac{n-1}{n}\right)^x + \left(\frac{n-2}{n}\right)^x$
What is the pmf, that is the probability that the second of the two integers is found after exactly $x$ samples and what can I use that will let me plot it reasonably accurately? I will need some approximation I assume as $n$ will be at least $10000$ in my case.
Update: What is the expect value of the variable $X$ which is the number of samples until both items are found?
Update 2: The answer is given below. However python can't cope with it it seems. The following code has yet to terminate on my home PC.
#!/usr/bin/python from __future__ import division import matplotlib.pyplot as plt def pmf(n, x): return 2*((n-1)**(x-1) - (n-2)**(x-1))/n**x n = 5000 pmfgraph = [pmf(n,x) for x in xrange(5*n)] plt.plot(pmggraph) plt.show()
You have to use the other formulation to get it to work.
return (2/n)*(((n-1)/n)**(x-1) - ((n-2)/n)**(x-1))