1
$\begingroup$

Given a randomness source $D$ with min-entropy $k$, we should expect that we will have to take more than $2^{k/2}$ samples before we see a repetition (claim from http://people.csail.mit.edu/madhu/FT04/scribe/lect20.pdf). The exact quote is:

So if a source has min-entropy $k$, the probability that the source gives a particular string is at most $2^{-k}$ . This denition is fairly general. If a source has min-entropy $k$, then we expect that we will have to take more than $2^{k/2}$ samples from the source before we see a repetition. Such a source must also have a support that contains more than $2^k$ elements.

I've been trying to prove this and here is my try:

Define a random variable $M$ which is the number of samples takes before a repetition is observed. The PMF of $M=m$ should be the probability of observing a sequence of $m$ strings which neither of which are equal to the others. Since the min-entropy is $k$, then no string has probability of appearing more than $2^{-k}$.

Assuming all strings have probability $p=2^{-k}$ (I am not sure of whether this assumption is a correct strategy but I don't know what else to do). So in our sequence of length $m$ we have the first string has probability $p$, the second one has probability $1-p$, the third one have probability $1-p(1-p)$ (assuming independence!), and so forth.

So the PMF should the product of all these. This is where I am having the difficulty of writing a general function for the PMF $f_M$, in order to use it afterwards to calculate the expectation of $M$, which would be $E[M]=\sum\limits_{m=1}^\infty m f_M(m) $.

I am thinking on the right track or I am so going in the wrong direction ?

1 Answers 1

2

In the equiprobable case when all strings have equal probability $p=2^{-k}$, the event $[M\ge m+1]$ occurs if one picked $m$ times a new string. Thus at each time $n\le m$, $n$ strings already appeared and we should pick one from the $2^k-n$ remaining ones. One gets $ P(M\ge m+1)=\prod_{n=1}^m(1-np), $ which yields $ E(M)=\sum_{m\ge0}\prod_{n=1}^m(1-np). $ One could estimate the asymptotics of $E(M)$ when $p\to0$ (1) but the text you quote does not refer to expectations, rather to the minimal $m$ such that $P(M\ge m)$ is small.

To estimate the value of such an integer $m$, assume that $p\ll1$ and even that $mp\ll1$. Then, for each $n\le m$, $1-np\approx\text{e}^{-np}$ hence $ P(M\ge m+1)\approx\text{e}^{-m(m+1)p/2}. $ Thus you are interested in values of $m$ such that $m^2p=\Theta(1)$, that is, $m=\Theta(1/\sqrt{p})$. Since $p=2^{-k}$, assuming $k\gg1$ you are done.

Note (1): Indeed, in the limit $p\ll1$, one an prove that $\sqrt{p}E(M)\to\sqrt{\pi/2}$.

  • 0
    This is very interesting, thanks :)!2011-08-16