2
$\begingroup$

I have a friend who has a list of n numbers from [0, x). They are integers (though x is so large that doesn't really matter), and are uniformly randomly distributed. My friend figures out the smallest member from this set, and calls it y.

I know x and y, but not the original list of numbers or n. How can I estimate n? How accurate will my estimate be?

1 Answers 1

1

The bias of any estimator will go to $\infty$ as $n\to\infty$. This is because as $n\to\infty$, the probability that $y=0$ goes to $1$. You have to assign some arbitrary estimate $n_0$ to $y=0$, and then no matter what estimates you assign to the other values of $y$, as $n\to\infty$ the expected value of your estimate will go to $n_0$ as $n\to\infty$, whereas it should be going to $\infty$ along with $n$.

However, you can estimate $1/(n+1)$. This is messy and approximate for a discrete distribution, so I'll derive it for a uniform continuous distribution on $[0,1]$ and you can use it as an approximation for your discrete case.

The cumulative distribution function for $y$ in this case is $1-(1-y)^n$, the density is $n(1-y)^{n-1}$, and the expected value of $y$ is

$ \int_0^1n(1-y)^{n-1}y\,\mathrm dy=\frac1{n+1}\;. $

Thus $y$ is a unbiased estimator for $1/(n+1)$. Its variance is

$ \begin{align} \langle y^2\rangle-\langle y\rangle^2 &= \int_0^1n(1-y)^{n-1}y^2\,\mathrm dy-\frac1{(n+1)^2} \\ &= \frac2{(n+1)(n+2)}-\frac1{(n+1)^2} \\ &= \frac n{(n+1)^2(n+2)} \\ &\sim\frac1{(n+1)^2}\;. \end{align} $

Thus for large $n$ the standard deviation is also approximately $1/(n+1)$.

In the continuous case, you could use $n=y^{-1}-1$ to estimate $n$. The expected value of this estimator is infinite, but at least you'd have the right expected value for $1/(n+1)$. In the discrete case, you can't do this, since typically $y=0$ and then $y^{-1}$ isn't defined.

  • 0
    @Nick: Feel free to ask about any parts you're not sure about.2012-11-04