8
$\begingroup$

Is there a motivating reason for using maximum likelihood estimators? As for as I can tell, there is no reason why they should be unbiased estimators (Can their expectation even be calculated in a general setting, given that they are defined by a global maximum?). So then why are they used?

3 Answers 3

15

The principle of maximum likelihood provides a unified approach to estimating parameters of the distribution given sample data. Although ML estimators $\hat{\theta}_n$ are not in general unbiased, they possess a number of desirable asymptotic properties:

  • consistency: $\hat{\theta}_n \stackrel{n \to \infty}{\to} \theta$
  • normality: $ \hat{\theta}_n \sim \mathcal{N}( \theta, \Sigma )$, where $\Sigma^{-1}$ is the Fisher information matrix.
  • efficiency: $\mathbb{Var}(\hat{\theta}_n)$ approaches Cramer-Rao lower bound.

Also see Michael Hardy's article "An illuminating counterexample" in AMM for examples when biased estimators prove superior to the unbiased ones.


Added

The above asymptotic properties hold under certain regularity conditions. Consistency holds if

  • parameters identify the model (this ensure existence of the unique global maximum of the log-likelihood function)
  • parameter space of the model is compact,
  • log-likelihood function is continuous function of parameters for almost all $x$,
  • log-likelihood is dominated by an integrable function for all values of parameters.

Asymptotic normality holds if

  • the estimated parameters are away from the boundary of the parameter domain,
  • distribution domain does not depend on distribution parameters $\theta$,
  • the number of nuisance parameters does not depend on the sample size
  • 2
    Hi Sasha. I think you meant "consistency" in your first bullet point. You might also point out that some smoothness conditions are necessary to guarantee the normality result. In some instances, MLEs can have variance $O(n^{-2})$, for example, in which case they usually converge in distribution to something other than a normal. Consider the MLE of $\theta$ for a $\mathcal U(0,\theta)$ distribution for example.2012-02-24
  • 0
    @cardinal I mean efficientcy, as in [efficient statistics](http://en.wikipedia.org/wiki/Efficient_estimator). I agree with other remarks. Thanks!2012-02-24
  • 0
    Hi Sasha. Sorry I was not very clear at the beginning of my last comment. I believe there is a typo in your first bullet point (not the last one). Cheers.2012-02-24
  • 0
    +1, of course. (I answered this before I noticed that you had also cited my paper.)2012-02-24
  • 0
    Up voted but answer could be more precise since you can make all the properties given break with enough effort. Normality was already mentioned and you can break consistency by letting nuisance parameters grow with the sample size.2012-02-24
4

Unbiasedness is overrated by non-statisticians. Sometimes unbiasedness is a very bad thing. Here's a paper I wrote showing an example in which use of an unbiased estimator is disastrous, whereas the MLE is merely bad, and a Bayesian estimator that's more biased than the MLE is good.

Direct link to the pdf file: http://arxiv.org/pdf/math/0206006.pdf

(Now I see Sasha already cited this paper.)

  • 0
    I'm not completely buying it. Those pathological examples given were based on sampling only once. One might expect that for iid $X_1,\ldots,X_n$ that there exists some unbiased estimator $f(X_1,\ldots,X_n)$ that is "reasonable". Presumably if $n$ is large then pathological bad behavior can be avoided in $f$.2012-02-25
  • 0
    @user what exactly are you not buying? You expectation is off; there is no reason a priori to think that an unbiased estimator of a particular quantity exists (they often don't, and if you take a moment to think about what unbiasedness means in terms of integrals it really isn't surprising), let alone that this estimator is "reasonable." In fact there are completely non-pathological examples where a biased estimator can be shown to be "better" than every unbiased estimator, for example in estimating the variance of an iid sample of normal random variables with unknown mean.2012-02-25
  • 0
    $-2X$ was the unbiased estimator. Certainly if $n$ is large, then $-2(X_1+\cdots+X_n)/n$ is an unbiased estimator that has a high probability of being near the point to be estimated, and only a small probability of being outside the parameter space. (MLEs, on the other hand are _never_ outside of the parameter space.) So it's true that when the sample is large, there's not as much problem as when it's small. That shouldn't surprise anyone. But dealing rationally with small samples is necessary. That's why Student's distribution was introduced.2012-02-25
  • 0
    Can you explain what you mean in your reference to Student's distribution.2012-02-25
  • 0
    @user782220 : The motivation for Student's distribution is to deal with smallness of the sample size.2012-02-27
0

Consistency, normality, and efficiency of the maximum likelihood estimator play an important role when sample size is very large. In most situations, however, we do not have that many samples. Hence, these properties are not critical for supporting the maximum likelihood estimator. On the other hand, unbiased estimators sometimes do not work well as Dr. Hardy pointed out. I would add one more defect of unbiased estimators: most of unbiased estimators are not invariant.

Given these facts, I have suggested "maximum likelihood estimator in the light of future data." This terms is based on the understanding that conventional maximum likelihood estimators fit parameters to the current data, but we should fit parameters to the future data because our estimator should explain data which will be obtained in the future (in short, out purpose is prediction). This new estimator is invariant because this is a kind of maximum likelihood method although it is derived in the light of future data in stead of current data.

This new concept changes the definition of variance of the normal distribution for example. If you are interested in this idea, please refer to the paper below.

Takezawa, K. (2012): "A Revision of AIC for Normal Error Models," Open Journal of Statistics, Vol. 2 No. 3, 2012, pp. 309-312. doi: 10.4236/ojs.2012.23038. http://www.scirp.org/journal/PaperInformation.aspx?paperID=20651