1
$\begingroup$

Take the expression $(1-(1-\frac{1}{n})^x)$, which is the expected fraction of elements one samples from a set of $n$ elements, sampling with replacement (and uniform probability) $x$ times.

Provided some $n$, how do we analytically solve for $x$ s.t. $(1-(1-\frac{1}{n})^x) \geq P$? It appears that $\frac{x}{n}$ converges to a fixed value as $n \to \inf$. Is this true? If so, how do we find it without relying on numerical methods?

  • 1
    Your expression $(1-(1-\frac{1}{n})^x)$ is always less than $1$, so how can this be the expected number of unique elements?? Also, to even have the slightest nonzero chance of covering a portion $P$ with $x$ samples one needs $\frac xn\geq P$, and this requires $x\to\infty$ as $n\to\infty$. So obviously $x$ cannot.2012-10-23
  • 0
    @Marc von Leeuwen Sorry, I meant that the expression was the fraction of a set of n elements covered after sampling with replacement x times.2012-10-23

2 Answers 2

0

We have \begin{align*} 1 - \left(1 - \frac 1n\right)^x \ge P\\ \iff \left(1 - \frac 1n\right)^x \le 1-P\\ \iff x \cdot \log\left(1- \frac 1n\right) \le \log(1-P)\\ \iff x \ge \frac{\log(1-P)}{\log(1 - \frac 1n)} \end{align*} So the smallest such $x$ is $x(n) = \left\lceil\frac{\log(1-P)}{\log(1 - \frac 1n)}\right\rceil$ and $x(n) \to \infty$ as $n \to \infty$ (as $\log$ is continuous and $\log 1 = 0$).

1

The inequality can be rewritten as $$\left(1-\frac{1}{n}\right)^x\le 1-P.$$ We first solve the equation $$\left(1-\frac{1}{n}\right)^t= 1-P.$$ Take the (natural) logarithm of both sides. We get $$t\log\left(1-\frac{1}{n}\right)=\log(1-P).$$ Thus $$t=\frac{\log(1-P)}{\log\left(1-\frac{1}{n}\right)}.$$ Presumably we want $x$ to be an integer. The smallest integer $x$ for which our inequality holds is $\lceil t\rceil$.

It is clear that $x\to \infty$ as $n\to\infty$. However, something interesting happens if we look at $\frac{x}{n}$. We look instead at the closely related $\frac{t}{n}$, which has the same limit. We have $$\frac{t}{n}=\frac{\log(1-P)}{n\log\left(1-\frac{1}{n}\right)}.$$ Note that the denominator $n\log\left(1-\frac{1}{n}\right)$ is equal to $\log\left(\left(1-\frac{1}{n}\right)^n\right)$. But $(1-1/n)^n\to e^{-1}$ as $n\to\infty$, so the denominator approaches $-1$. It follows that as $n\to\infty$, $\dfrac{t}{n}$ (and therefore $\dfrac{x}{n}$) approaches $-\log(1-P)$.

  • 0
    Nicolas Sorry, I meant that the expression was the fraction of a set of n elements covered after sampling with replacement x times.2012-10-23