4
$\begingroup$

Assume a process that samples uniformly at random from the range $[1,\ldots,n]$. I am interested in the expected time to find a duplicate given only that the sampling process is pairwise independent. That is I would like to find the expected time until a sample matches one of the samples taken before.

One way to compute the expected time is $E=\sum_{x=1}^\infty P(X \geq x).$ Here $X$ is a random variable that represents the time at which the first duplicate is found.

If the samples were fully independent then we would have that $\sum_{x=1}^\infty P(X \geq x) = \sum_{x=1}^\infty \prod_{i=1}^{x-1} (1-i/n).$ We can bound this quantity using the fact that $e^{-2x} \leq 1-x \leq e^{-x}$ when $0 \leq x \leq 0.5$ and then bound the resulting sum using an integral. This gives an asymptotic bound of $\Theta(\sqrt{n})$ steps.

How would you perform a similar analysis where we only have pairwise independence and also preferably get an asymptotic result rather than just an upper bound?

EDIT: Following Henry's great reply, I would really like to know a lower bound as well as the upper bound he has given. In particular, does the lower bound for fully independent sampling of $\Omega(\sqrt{n})$ also apply to this pairwise independent case? Or alternatively, can anyone think of a pairwise independent process that has mean less than order $\sqrt{n}$?

3 Answers 3

2

The worst case is $O(n)$. Clearly by pigeonhole arguments the expectation cannot exceed $n+1$ but there are examples where it exceeds $n$.

Suppose the samples are $S_1, S_2, \ldots$ and

  • that $S_1$ has a uniform distribution on $\{1,2,\ldots,n\}$,

  • that with probability $\frac{1}{n}$ you have $S_1 = S_2 = \cdots = S_{n} $ and with probability $\frac{n-1}{n}$ you have $S_j$ having a uniform distribution on $\{1,2,\ldots,n\} - \{S_1,\ldots , S_{j-1} \}$ for $2 \le j \le n$ and

  • that $S_{n+1}$ has a uniform distribution on $\{1,2,\ldots,n\}$ independently of all the others,

then there is a probability $\frac{1}{n}$ that you need a sample size of $2$ and a probability $\frac{n-1}{n}$ you need a sample size of $n+1$ to find a duplicate value. This gives an expectation of $n+\frac{1}{n}$ which is $O(n)$.

  • 0
    Bounty awarded as this was the best answer received on math stacke$x$change. Thanks.2012-07-16
1

A simple lower bound proof was given at https://mathoverflow.net/questions/102079/bounds-for-duplicate-finding-with-limited-independence by Douglas Zare. I include a slightly edited copy for completeness.

The expected number of duplicates among the first $m$ is an upper bound for the probability of a duplicate in the first $m$. For $m=c\sqrt{n}$ the expected number of duplicates is under $c/2$, so the probability that the first duplicate is greater than $\sqrt{n}$ is greater than $1/2$, which means the expected first duplicate is greater than $\sqrt{n}/2$.

0

This is the generalized birthday problem. Your $n$ is $d$ in the article.

  • 0
    Good point, thanks.2012-07-13