8
$\begingroup$

Is there a motivating reason for using maximum likelihood estimators? As for as I can tell, there is no reason why they should be unbiased estimators (Can their expectation even be calculated in a general setting, given that they are defined by a global maximum?). So then why are they used?

3 Answers 3

15

The principle of maximum likelihood provides a unified approach to estimating parameters of the distribution given sample data. Although ML estimators $\hat{\theta}_n$ are not in general unbiased, they possess a number of desirable asymptotic properties:

  • consistency: $\hat{\theta}_n \stackrel{n \to \infty}{\to} \theta$
  • normality: $ \hat{\theta}_n \sim \mathcal{N}( \theta, \Sigma )$, where $\Sigma^{-1}$ is the Fisher information matrix.
  • efficiency: $\mathbb{Var}(\hat{\theta}_n)$ approaches Cramer-Rao lower bound.

Also see Michael Hardy's article "An illuminating counterexample" in AMM for examples when biased estimators prove superior to the unbiased ones.


Added

The above asymptotic properties hold under certain regularity conditions. Consistency holds if

  • parameters identify the model (this ensure existence of the unique global maximum of the log-likelihood function)
  • parameter space of the model is compact,
  • log-likelihood function is continuous function of parameters for almost all $x$,
  • log-likelihood is dominated by an integrable function for all values of parameters.

Asymptotic normality holds if

  • the estimated parameters are away from the boundary of the parameter domain,
  • distribution domain does not depend on distribution parameters $\theta$,
  • the number of nuisance parameters does not depend on the sample size
  • 0
    Up voted but answer could be more precise since you can make all the properties given break with enough effort. Normality was already mentioned and you can break consistency by letting nuisance parameters grow with the sample size.2012-02-24
4

Unbiasedness is overrated by non-statisticians. Sometimes unbiasedness is a very bad thing. Here's a paper I wrote showing an example in which use of an unbiased estimator is disastrous, whereas the MLE is merely bad, and a Bayesian estimator that's more biased than the MLE is good.

Direct link to the pdf file: http://arxiv.org/pdf/math/0206006.pdf

(Now I see Sasha already cited this paper.)

  • 0
    @user782220 : The motivation for Student's distribution is to deal with smallness of the sample size.2012-02-27
0

Consistency, normality, and efficiency of the maximum likelihood estimator play an important role when sample size is very large. In most situations, however, we do not have that many samples. Hence, these properties are not critical for supporting the maximum likelihood estimator. On the other hand, unbiased estimators sometimes do not work well as Dr. Hardy pointed out. I would add one more defect of unbiased estimators: most of unbiased estimators are not invariant.

Given these facts, I have suggested "maximum likelihood estimator in the light of future data." This terms is based on the understanding that conventional maximum likelihood estimators fit parameters to the current data, but we should fit parameters to the future data because our estimator should explain data which will be obtained in the future (in short, out purpose is prediction). This new estimator is invariant because this is a kind of maximum likelihood method although it is derived in the light of future data in stead of current data.

This new concept changes the definition of variance of the normal distribution for example. If you are interested in this idea, please refer to the paper below.

Takezawa, K. (2012): "A Revision of AIC for Normal Error Models," Open Journal of Statistics, Vol. 2 No. 3, 2012, pp. 309-312. doi: 10.4236/ojs.2012.23038. http://www.scirp.org/journal/PaperInformation.aspx?paperID=20651