4
$\begingroup$

Refering to this wikipedia page Unbiased estimation of standard deviation, it says that "it follows from Jensen's inequality that the square root of the sample variance is an underestimate".

I do know that for the concave square root function, Jensen's inequality says that the square root of the mean > mean of the square root.

So, how do we conclude that the square root of the sample variance underestimates population standard deviation?

Since we know from Jensen's inequality that square root of the mean > mean of the square root, does "square root of sample variance" somehow relate to "mean of the square root" while "population standard deviation" somehow relates to "square root of the mean"?

Added after joriki's response:

Given joriki's response about using a single sampling of data, I am now left with why $s=\sqrt{\frac{1}{N-1}\sum_{i=1}^N{(x_i-\overline{x})^2}}$ will underestimate pop std dev. In order to use Jensen's inequality (mean of the square root < square root of the mean). I need to somehow relate the expression for $s$ to "mean of square root". I do see the square root sign in the expression for $s$ but where is the "mean" of this square root quantity?

  • 1
    @JohnC: You ask where is the mean. Suppose we use the already square rooted expression as an *estimator* of the population standard deviation $\sigma$. We imagine taking the mean in order to find out whether this estimator is biased or not. If the mean of this estimator is $\sigma$, then it is unbiased. The Jensen's Inequality argument shows that except in trivial cases, the mean is *less* than $\sigma$. This lets us conclude that the estimator is not unbiased, and even more, in what direction it errs on average.2012-02-08

3 Answers 3

2

The mean is part of what it means for an estimator to be biased. You can't make the estimator unbiased by averaging over several estimates; to the contrary, you can show that it's biased by averaging over estimates and showing that the expected average isn't the value to be estimated. (You can reduce the bias and the variance of the estimator by averaging several estimates, but as discussed above you can do that even better by using all the data for one estimate.)

For example, if your population has equidistributed values $-1,0,1$, with variance $\frac23$, and you take a sample of $2$, you'll get variance estimates of $0$, $\frac12$ and $2$ with probabilities $\frac13$, $\frac49$ and $\frac29$, respectively, yielding the correct mean $\frac13\cdot0+\frac49\cdot\frac12+\frac29\cdot2=\frac23$, whereas the estimates for the standard deviation, $0$, $\sqrt{\frac12}$ and $\sqrt2$ average to $\frac13\cdot0+\frac49\cdot\sqrt{\frac12}+\frac29\cdot\sqrt2=\frac49\sqrt2\neq\sqrt{\frac23}$, with $\frac49\sqrt2\approx0.6285\lt0.8165\approx\sqrt{\frac23}$, an underestimate as expected. If you take a sample of $3$ instead, the mean improves to $\frac19\cdot0+\frac49\cdot\sqrt{\frac13}+\frac29\cdot\sqrt{\frac43}+\frac29\cdot1=\frac19(8\sqrt{\frac13}+2)\approx0.7354$.

2

Let's assume we're picking $n$ independent samples from the same (unknown) distribution. Thus, the samples $x_1, x_2, \dotsc, x_n$ are independent and identically distributed random variables with some unknown mean $\mu$ (which we may approximate by the sample mean $\bar x = \frac 1 n \sum x_i$) and standard deviation $\sigma$, which we wish to estimate.

As André Nicolas notes in his first comment, the sample variance $\tilde \sigma^2 = \frac 1{n-1} \sum_{i=1}^n(x_i-\bar x)^2$ is a random variable whose mean or expected value $\mathrm E[\tilde \sigma^2]$ is equal to the true variance $\sigma^2$ of the unknown distribution. Thus, $\tilde \sigma^2$ is an unbiased estimator of $\sigma^2$. However, because the square root function is concave, by Jensen's inequality the mean $\mathrm E[\tilde \sigma]$ of its square root $ \tilde \sigma = \sqrt{\tilde \sigma^2} = \sqrt{\frac 1{n-1} \sum_{i=1}^n(x_i-\bar x)^2} $ is (except in trivial cases) less than the square root $\sigma$ of its mean $\mathrm E[\tilde \sigma^2] = \sigma^2$. Thus, $\tilde \sigma$ is an underestimate of the true standard deviation $\sigma$.

1

Let's warm up. Subject to the constraint $\sum_{i} r_i = 1$, we have the following. \begin{align*} \sqrt{r_1 a_1 + r_2 a_2 + r_3 a_3} &\ge r_1 \sqrt{a_1} + r_2 \sqrt{a_2} + r_3 \sqrt{a_3} \end{align*} Substituting $r_i = \frac{1}{3}$, $a_1 = 1$, $a_2 = 16$ and $a_3=25$ we have \begin{align*} \sqrt{\frac{1}{3} (1 + 16 + 25)} &\ge \frac{10}{3}. \end{align*}

Let $\phi$ be a concave function such as the square root function. Then, by Jensen's inequality, we have $ \phi(\mathbb{E}[x]) \ge \mathbb{E}[\phi(x)]. $ Further, $\mathbb{E}[x] = T$, the true mean, in an unbiased estimator. We desire to estimate $\phi(T)$. But we have $ \mathbb{E}[\phi(x)] \le \phi(T). $ Hence, we get an 'underestimate' stochastically speaking.