3
$\begingroup$

Given $N$ points ($x_k$, k from $1$ to $N$) generated from a normal distribution (1-dimensional case) with known mean $\mu$, the Maximum Likelihood estimation of the variance is $\frac{1}{N}\sum_{k=1}^{\infty} (x_k-\mu)^2$.

How is it justified that this estimation is biased for finite N?

thanks, Nikos

  • 0
    MLEs are not required to be unbiased. I'm confused as to what you're asking.2011-10-18
  • 3
    Since $\mu$ is known, $\frac{1}{N}\sum_{k=1}^N(x_k - \mu)^2$ _is_ an unbiased estimator for $\sigma^2$. If $\mu$ is unknown and the sample mean $\bar{x} =\frac{1}{N}\sum_{k=1}^N x_k$ is used instead, then $\frac{1}{N}\sum_{k=1}^N(x_k - \bar{x})^2$ is a biased estimator for $\sigma^2$.2011-10-18
  • 0
    @Dilip Sarwate: But in the text i am reading currently, it is explicitly mentioned that in the ML case the estimation is unbiased only asymptotically and not for finite $N$2011-10-18
  • 3
    "in the ML case the estimation is unbiased only asymptotically and not for finite $N$" is perfectly correct. If the mean $\mu$ is unknown, the ML estimator for $\sigma^2$ is $\frac{1}{N}\sum_{k=1}^N(x_k - \bar{x})^2$ and _is_ biased. It is asymptotically unbiased as $N \to \infty$. (The unbiased estimator in this case is $\frac{1}{N-1}\sum_{k=1}^N(x_k - \bar{x})^2$). But your question specifically stated that $\mu$ is known and in this case _(which is different from the case of unknown $\mu$)_, the ML estimator for $\sigma^2$ is what you said, and it is unbiased for all $N$.2011-10-18
  • 0
    in the text it is said that the expectation of the ML estimation of the variance is equal to: $1/N\sum_{k=1}^NE[(x_k - \mu)^2] = ((N-1)/N)\sigma$ all that for known $\mu$2011-10-18
  • 0
    What text is this? and are you absolutely sure you have copied what your text says _exactly?_ e.g. $\sigma$ instead of $\sigma^2$?2011-10-18
  • 0
    If that's what the book said, then I suspect the book was treating the case where $\mu$ is not known but has to be estimated. Certainly in the case where $\mu$ has to be estimated, the MLE is biased.2011-10-19
  • 0
    @Dilip Sarwate: no, it says $\sigma^2$2011-10-19

2 Answers 2

3

Filling in some details left out of the comments by @Michael Hardy, @zyx, and myself, suppose $\vec{X} = (X_1, \ldots, X_N)$ where the $X_i, 1 \leq i \leq N,$ are independent $N(\mu, v)$ random variables with known mean $\mu$ and unknown variance $v$. The joint density function is $$f_{\vec{X}}(\vec{x}) = (2\pi v)^{-N/2}\exp(-a/v) ~\text{where}~ \vec{x} = (x_1, \ldots, x_N)~ \text{and}~ a = \frac{1}{2} \sum_{i=1}^N (x_i - \mu)^2.$$ If $\vec{X}$ is observed to have value $\vec{x}$, the likelihood function is $L(v) = (2\pi v)^{-N/2}\exp(-a/v)$ and it is easy to show that $L(v)$ attains maximum value at $$v = \frac{2a}{N} = \frac{1}{N}\sum_{i=1}^N (x_i - \mu)^2$$ and so the maximum likelihood estimator (MLE) of $v$ is $\frac{1}{N}\sum_{i=1}^N (x_i - \mu)^2$. We have $$ E\left [ \frac{1}{N}\sum_{i=1}^N (X_i - \mu)^2 \right ] = \frac{1}{N}\sum_{i=1}^N E[(X_i - \mu)^2] = \frac{1}{N}\sum_{i=1}^N v = v, $$ and thus, contrary to any alleged claims in an unspecified textbook in the possession of Nikos, the MLE for $v$ is unbiased in this instance.

What if $\mu$ is also unknown? The likelihood function is now $$ L(\mu, v) = (2\pi v)^{-N/2}\exp\left [ -\frac{1}{2v}\sum_{i=1}^N (x_i - \mu)^2\right ] $$ and has a global maximum at $$ (\mu, v) = \left (\frac{1}{N}\sum_{i=1}^N x_i, \frac{1}{N}\sum_{i=1}^N \left (x_i -\frac{1}{N}\sum_{i=1}^N x_i \right)^2 \right ) = \left ( \bar{x}, \frac{1}{N}\sum_{i=1}^N (x_i -\bar{x})^2 \right ).$$ The MLE for $v$ is thus $\frac{1}{N}\sum_{i=1}^N (x_i -\bar{x})^2$ and is biased since $$ E\left [ \frac{1}{N}\sum_{i=1}^N \left (X_i -\frac{1}{N}\sum_{i=1}^N X_i \right)^2 \right ] = \left(\frac{N-1}{N}\right)v $$ but, as noted by Nikos's textbook, the MLE for $v$ is asymptotically unbiased in the limit as $N \to \infty$. On the other hand, it should be obvious from the above description that $\frac{1}{N-1}\sum_{i=1}^N (x_i -\bar{x})^2$ is an unbiased estimator for $v$ for all $N \geq 2$.

  • 0
    the textbook in question is: Pattern Recognition, 4th ed. by S.Theodoridis, K. Koutroumbas. If you happen to have it in your possesion take a look at p. 35 and 372011-10-19
  • 0
    Actually, the $N(\mu,v)$ density is proportional to $\exp(-(x-\mu)^2/(2v))$. You're missing a "2" in the denominator.2011-10-19
  • 2
    @MichaelHardy I thought I included the $2$ in the denominator as part of the quantity I called $a$. Is there a $2$ missing elsewhere?2011-10-19
0

The "bias for finite N" is for a different estimator where $\mu$ is replaced by $\overline{x} = \frac{x_1 + \cdots + x_N}{N}$. If the "true" value of $\mu$ is given the estimator in the question is unbiased.

This was the subject of an earlier question,

Intuitive Explanation of Bessel's Correction

and in my answer there it is explained why the calculation with $\mu$ is unbiased.

  • 0
    This is wrong in this case. Maximum likelihood estimators are typically biased, but this one is not.2011-10-18
  • 0
    @Michael: I missed the $\mu$, since the question talked about "bias for finite N" which exists only if $\overline{x}$ is meant. See edit.2011-10-18
  • 0
    @Michael: one can also see from the comments, where a factor of N-1 appears in a formula with $\mu$, that either the question or the textbook is using $\mu$ where the semantics dictate $\overline{x}$ as the correct reading.2011-10-18
  • 1
    @zyx: It's not only that it uses $\mu$ as a symbol but it is also explicitly mentioned in the formulation of the problem that $\mu$ is known2011-10-19
  • 0
    Nikos, where you wrote "in the text it is said that the expectation of the ML estimation of the variance is equal to: $1/N\sum_{k=1}^NE[(x_k - \mu)^2] = ((N-1)/N)\sigma^2 \quad $", the factor of (N-1) can appear only if $\mu$ is *unkown* and therefore estimated using the $x_i$. So the question and/or the book does use $\mu$ as a symbol, but the notion of "bias for finite N" and the formula you wrote down, make sense only if $\mu$ is interpreted as the name for an unknown parameter that is estimated by $\overline{x} = (x_1 + \cdots + x_N)/N$.2011-10-19