3
$\begingroup$

https://mathoverflow.net/questions/14964/estimate-population-size-based-on-repeated-observation asks the following question.

I take the bus to work every day. Every bus has a serial number, but unlike in the German Tank Problem, I don't know if they are numbered uniformly $1...n$.

Suppose the first $k$ buses are all different, but on day $k+1$ I take one I've been on before. What is the best estimate for the total number of buses?

The provided answer gives a maximum likelihood estimator as well as an unbiased estimator of $k(k+1)/2$

If you know the number of buses can't be larger than some given value $N \geq k+1$, how does that change the maximum likelihood estimator and the unbiased estimator ?

1 Answers 1

0

The unbiased estimator doesn't change. The requirement that the expected value of the estimate is $n$ for all $N$ possible values of $n$ yields $N$ linear contraints on the $N$ estimates of $n$ for the $N$ different possible values of $k$. This $N\times N$ system of linear equations is triangular with non-zero diagonal and thus has a unique solution; since we know that $k(k+1)/2$ is a solution, this is the only unbiased estimator. It yields estimates of $n$ greater than $N$ for most values of $k$, but there is no unbiased estimator without this undesirable property. This is also intuitively clear, since for $n=N$ the estimates below $N$ must be compensated by estimates above $N$ to obtain the expected value $N$.

The maximum likelihood estimator changes in that if the original maximum likelihood estimate would have been greater than $N$, this should be replaced by $N$, since the likelihood is unimodal and thus takes its maximum on $\{1,\dotsc,N\}$ at $N$ if the maximum likelihood estimate is greater than $N$.

  • 0
    so the unbiased estimator seems like a terrible idea in this case. Is there a better choice?2012-12-06
  • 0
    @Anush: It's not as terrible as it may seem, since estimates above $N$ are quite unlikely unless $n$ is close to $N$. If $n$ is close to $N$, there's not much you can do; e.g. for $n=N$, either you accept some estimates above $N$ or you get a seriously biased estimator. As always in this area, "a better choice" depends on your criteria for an estimator. Certainly the maximum likelihood estimator is a possible choice that avoids the problem of estimates greater than $N$.2012-12-06
  • 0
    Thank you very much. I am also interested in a Bayesian approach but that is a separate question I know.2012-12-06
  • 0
    Can I ask what the estimators would be if you stop having just seen $k$ different buses?2012-12-07
  • 0
    @Anush: It depends on why you stop. Do you stop at a predetermined $k$? Or randomly, e.g. with a certain probability after each bus?2012-12-07
  • 0
    I meant to stop at a predetermined $k$.2012-12-08
  • 0
    @Anush: There's neither an unbiased nor a maximum likelihood estimator in that case. The likelihood of taking $k$ different buses monotonically increases towards $1$ as $n\to\infty$, so there's no maximum likelihood. Also, since the probability of taking $k$ buses goes to $1$ and all other probabilities go to $0$ as $n\to\infty$, the expectation value of the estimate tends to the estimate for $k$ (whatever that is), whereas it would have to be $n$ (and thus not tend to a constant) if the estimator were unbiased.2012-12-09
  • 0
    Thanks. Does that mean that when you know the number of buses can't be larger than some given value $N$ and you stop after some predetermined $k$ buses, the MLE is just $N$?2012-12-09
  • 0
    @Anush: Yes, I think so.2012-12-09