2
$\begingroup$

I have a set of $N$ elements and I iterate the following procedure (all the elements at the beginning are not marked):

  • I pick an element at random.
  • If it is marked, I stop.
  • If it is not marked, I mark it and I repeat the procedure.

How many times on average do I iterate the procedure before stopping because I am picking something I have already picked?

My best shot at solving it

The probability of picking one more, having already picked one is

\begin{equation*} p_i = \frac{N - i}{N} \end{equation*}

The probability of picking at least $i$ is therefore:

\begin{equation*} s_i = \frac{N!}{(N - i)! N^i} \end{equation*}

The probability of picking exactly $i$ is

\begin{equation*} f_i = s_i - s_{i + 1} = \frac{N!}{(N - i)!N^i}\frac{i}{N} \end{equation*}

So far so good. Now, I use Stirling's approximation ($N$ is very large and the probability to have a large $i$ is very small) and (if I didnt' get any calculation wrong) I get

\begin{equation*} f_i = \frac{i}{N} \left(\frac{N}{N - i}\right)^{(N - i + 1/2)} \end{equation*}

The average number of picks $$ is therefore:

\begin{equation*} = \sum_j^N \frac{j^2}{N} e^{-j} \left(\frac{N}{N - j}\right)^{(N - j + 1/2)} \end{equation*}

But here I have no clue on how to go on. I can imagine this is a fairly known problem, but I wouldn't know where to look for, does it have a commonly known name?

  • 0
    The number $X_N$ of picks before getting an already marked item is such that, for every $k\geqslant1$, $$P(X_N\geqslant k)=\prod_{i=1}^{k-1}\left(1-\frac{i}N\right)$$ As long as $k^2\ll N$, the RHS is approximately $$\exp\left(-\sum_{i=1}^{k-1}\frac{i}N\right)=\exp\left(-\frac{k(k-1)}{2N}\right)$$ hence $$\frac{X_N}{\sqrt{N}}\to\xi$$ in distribution where, for every nonnegative $x$, $$P(\xi\geqslant x)=e^{-x^2/2}$$ More refined arguments show the convergence of the moments hence $$\frac{E(X_N)}{\sqrt{N}}\to E(\xi)=\int_0^\infty P(\xi\geqslant x)dx=\int_0^\infty e^{-x^2/2}dx=\sqrt{\frac2\pi}$$2017-01-24

1 Answers 1

0

This has been researched. It is a variant of the Birthday Problem. It seems you're on the right track according to Wikipedia, but your formula is a little off. Knuth gives $1+\sum_{i=1}^n\frac{N!}{(N - i)! N^i}$ as the answer. His proof is in The Art of Computer Programming. Ramanujan also offers an asymptotic approximation in the link, which is also called "Ramanujan's Q function."