5
$\begingroup$

Consider a random process which samples uniformly with replacement from [n]. The expected time to find a duplicate is a constant factor times $\sqrt{n}$. This is a version of the famous Birthday Problem. How can one find the expected time to find the first copy of the first duplicate? This will obviously occur some time before. Also, what is the distribution of this time?

If you fix the position of the later copy of the duplicate then the earlier copy seems to occur uniformly before it but I am not sure how helpful that is to answer the questions.

By time I simply mean that each sample takes one unit of time so the time is just the number of samples at that point. This is not a question about algorithms or computation.

  • 0
    Your comment at the end is quite helpful, at least for the mean.2012-11-17

1 Answers 1

1

Call $2\leqslant D_n\leqslant n+1$ the time of the first duplicate and $1\leqslant C_n\leqslant D_n-1$ the time of the first copy of the first duplicate.

As you noted, for each $2\leqslant k\leqslant n+1$, conditionally on the event $[D_n=k]$, $C_n$ is uniformly distributed on $\{1,2,\ldots,k-1\}$ hence $\mathbb E(C_n\mid D_n=k)=\frac12k$. Thus, $\mathbb E(C_n)=\frac12\mathbb E(D_n)$ and each asymptotics on $\mathbb E(D_n)$ when $n\to\infty$ translates into an asymptotics on $\mathbb E(C_n)$.

Likewise, for every $1\leqslant i\leqslant n$, decomposing the event $[C_n=i]$ into its intersections with the events $[D_n=k]$ for $i+1\leqslant k\leqslant n+1$, one gets $$ \mathbb P(C_n=i)=\sum_{k=i+1}^{n+1}\frac{\mathbb P(D_n=k)}{k-1}. $$ Recall finally that, for every $2\leqslant k\leqslant n+1$, $$ \mathbb P(D_n\geqslant k)=\prod_{\ell=1}^{k-2}\frac{n-\ell}n, $$ hence, for every $1\leqslant i\leqslant n$, $$ \mathbb P(C_n=i)=\frac1n\sum_{k=i}^{n}\prod_{\ell=1}^{k-1}\frac{n-\ell}n=\frac{n!}{n^{n+1}}\sum_{k=0}^{n-i}\frac{n^k}{k!}. $$ Edit: Let $N_n$ denote a Poisson random variable with parameter $n$, then $$ \mathbb P(C_n=i)=\frac{n!\mathrm e^n}{n^{n+1}}\mathbb P(N_n\leqslant n-i). $$ Asymptotics follow from this identity since the prefactor $n!\mathrm e^n/n^{n+1}$ is equivalent to $\sqrt{2\pi/n}$ when $n\to\infty$, and $(N_n-n)/\sqrt{n}$ converges in distribution to a standard normal random variable. Thus, if $i_n/\sqrt{n}\to x$, $$ \mathbb P(C_n=i_n)\sim\sqrt{\frac{2\pi}n}\Phi(-x)=\frac1{\sqrt{n}}\int_x^{+\infty}\mathrm e^{-s^2/2}\mathrm ds. $$ Summing these yields $$ \mathbb P(C_n\geqslant i_n)\sim\int_x^{+\infty}(s-x)\mathrm e^{-s^2/2}\mathrm ds=\int_0^{+\infty}s\mathrm e^{-(s+x)^2/2}\mathrm ds. $$ In other words, $C_n/\sqrt{n}$ converges in distribution to a random variable $C$ whose probability density function $f_C$ is defined, for every $x\geqslant0$, by $$ f_C(x)=\int_x^{+\infty}\mathrm e^{-s^2/2}\mathrm ds. $$

  • 0
    That last equality is not correct. In fact $\mathbb{E}(D_n) = 1 + \sum_{k=1}^n \tfrac{1}{k}$ if I recall correctly.2012-11-18
  • 0
    @WimC Where do you see a formula for $\mathbb E(D_n)$ in my post?2012-11-18
  • 0
    Nowhere, I just put two remarks in a single comment. :-)2012-11-18
  • 0
    @WimC Care to elaborate about the first part of your comment, then?2012-11-18
  • 0
    Take $k=3$. I get $\mathbb{P}(D_n \geq 3) = 1-1/n$ right?2012-11-18
  • 0
    @WimC Right. Thanks for the correction.2012-11-18
  • 0
    @did, Thanks. I find it a little difficult to get an intuitive feel for this function. It seems to be asymptotically like $1/n \int_{k=i}^n e^{-k^2/n}$ which at least gives me some clue to what the distribution looks like.2012-11-18
  • 0
    Is there a simple intuitive explanation for why the mode is $1$ and the function is strictly decreasing that one can understand without the derivation?2012-11-18
  • 0
    Consider $i\lt j$. For every value of $D_n$, conditionally on $D_n$, either $[C_n=i]$ and $[C_n=j]$ are equiprobable (if $D_n\gt j$) or the former has positive probability and the latter is impossible. Summing these shows that $[C_n=i]$ is (globally) more probable than $[C_n=j]$.2012-11-18
  • 0
    *It seems to be asymptotically like...* Which asymptotic regime do you consider?2012-11-18
  • 0
    @did, I just approximate $1-x$ by $e^{-x}$ and the sum by an integral. This is meant to be for large $n$ and ignoring lower order terms and constant factors. Do you know a good way to give an asymptotic approximation or even upper and lower bounds in terms of elementary functions?2012-11-19
  • 0
    See Edit. $ $ $ $2012-11-19