61
$\begingroup$

What is the variance of the sample variance? In other words I am looking for $\mathrm{Var}(S^2)$.

I have started by expanding out $\mathrm{Var}(S^2)$ into $E(S^4) - [E(S^2)]^2$

I know that $[E(S^2)]^2$ is $\sigma$ to the power of 4. And that is as far as I got.

  • 0
    Your expressions are very difficult to read. You need to edit and present your question in a better way.2011-10-16
  • 1
    One way of expressing $Var(S^2)$ is given on the Wikipedia page for [variance](http://en.wikipedia.org/wiki/Variance#Distribution_of_the_sample_variance).2011-10-16
  • 0
    It doesn't show how they derived it.2011-10-16
  • 0
    The solution to the question is in many books. You can easily find it.2011-10-16
  • 1
    There is a derivation on MathWorld's [Sample Variance Distribution](http://mathworld.wolfram.com/SampleVarianceDistribution.html) page. They use the "divide by $N$" convention rather than the "divide by $N-1$" convention, though, so you might have to adjust for that.2011-10-16
  • 0
    Is there an easier way to do this using the chi squared distribution with n-1 degrees of freedom?2011-10-16

4 Answers 4

1

The best way to understand what the variance of a sample looks like is to derive it from scratch.

On the following site you will find the complete derivation (it goes over 70 steps) of the sample variance. It takes a bit of time to fully understand how it is working, but if one goes over the whole derivation several times it becomes quite clear.

You will also understand better, why the proposed sample variance estimator is unbiased.

http://economictheoryblog.wordpress.com/2012/06/28/latexlatexs2/

76

Here's a general derivation that does not assume normality.

Let's rewrite the sample variance $S^2$ as an average over all pairs of indices: $$S^2={1\over{n\choose 2}}\sum_{\{i,j\}} {1\over2}(X_i-X_j)^2.$$ Since $\mathbb{E}[(X_i-X_j)^2/2]=\sigma^2$, we see that $S^2$ is an unbiased estimator for $\sigma^2$.

The variance of $S^2$ is the expected value of $$\left({1\over{n\choose 2}}\sum_{\{i,j\}} \left[{1\over2}(X_i-X_j)^2-\sigma^2\right]\right)^2.$$

When you expand the outer square, there are 3 types of cross product terms $$\left[{1\over2}(X_i-X_j)^2-\sigma^2\right] \left[{1\over2}(X_k-X_\ell)^2-\sigma^2\right]$$ depending on the size of the intersection $\{i,j\}\cap\{k,\ell\}$.

  1. When this intersection is empty, the factors are independent and the expected cross product is zero.

  2. There are $n(n-1)(n-2)$ terms where $|\{i,j\}\cap\{k,\ell\}|=1$ and each has an expected cross product of $(\mu_4-\sigma^4)/4$.

  3. There are ${n\choose 2}$ terms where $|\{i,j\}\cap\{k,\ell\}|=2$ and each has an expected cross product of $(\mu_4+\sigma^4)/2$.

Putting it all together shows that $$\mbox{Var}(S^2)={\mu_4\over n}-{\sigma^4\,(n-3)\over n\,(n-1)}.$$ Here $\mu_4=\mathbb{E}[(X-\mu)^4]$ is the fourth central moment of $X$.

  • 1
    a related question on stats.SE asks provides a different solution, and asks for a reference, your input would be appreciated: http://stats.stackexchange.com/q/29905/27502012-06-06
  • 1
    @Abe Sorry, I don't have any references or worthwhile input. The above is a solution that I made up to teach my students.2012-06-06
  • 2
    thanks, [an answer to the stats.SE question](http://stats.stackexchange.com/a/29945/2750) solved my confusion: the discrepancy was use of kurtosis ($\mu_4$, the fourth central moment) vs excess kurtosis ($\kappa = \frac{\mu_4}{\sigma^4} -3$); one reference is [Mood Graybill and Boes, 1974, Introduction to the Theory of Statistics](http://www.amazon.com/dp/0070428646/?tag=stackoverfl08-20)2012-06-06
  • 0
    Prof Byron, so how can one now interpret this result? That the estimated $S^2$ (from an arbitrary distribution) has a mean of $\sigma ^2$ and a variance of ${\mu_4\over n}-{\sigma^4\,(n-3)\over n\,(n-1)}$ from a NORMAL distribution?2015-03-09
  • 0
    @ByronSchmuland It's probably too basic, but I have problems with the first expression of variance as a pair of indices. Is there any way you can send a reference for this equation? Ty2015-08-03
  • 0
    @AntoniParellada It is just algebra. Expand the square in the double sum $\sum_{i=1}^n \sum_{j=1}^n (x_i-x_j)^2$ and simplify. It should be pretty clear from there, but if not let me know.2015-08-03
  • 0
    @ByronSchmuland Thank you. I'm lacking math background... Are the i's observations from the sampled distribution and the j's means of samples? It doesn't make sense because the sums are from 1 to n, and the number of samples can be different from the number of observations in a sample... :-) Totally lost, too much to ask. I'll scour the internet for answers... Thank you!2015-08-03
  • 1
    @AntoniParellada The double sum I wrote has nothing to do with probability or statistics. You take with any set of $n$ numbers $\{x_1,\dots,x_n\}$ and compute. The $i$ and $j$ are arbitrary indices in $\{1,2,\dots, n\}.$2015-08-03
  • 0
    Where can I find the value of the variance of the sample deviation, please? var(S)2016-09-14
  • 5
    In the derivation, how do we see claims 2 and 3, i.e. that the expected value of $$\left[{1\over2}(X-Y)^2-\sigma^2\right] \left[{1\over2}(X-Y)^2-\sigma^2\right]$$ is $(\mu_4+\sigma^4)/2$, for X,Y i.i.d?2017-07-28
  • 0
    There are $4n(n-1)(n-2)$ terms in the case 2 and $2n(n-1)$ terms in the case 32017-10-05
  • 0
    Also, the factor on the RHS for $S^2$ is $1/n(n-1)$2017-10-05
  • 0
    Sorry, I'm a bit late to the party, but how is your definition equivalent with $\frac{1}{n-1} \sum_i(X_i -\bar{X})^2$?2017-11-19
  • 0
    Sorry, got it! It's pretty neat.2017-11-19
57

Maybe, this will help. Let's suppose the samples are taking from a normal distribution. Then using the fact that $\frac{(n-1)S^2}{\sigma^2}$ is a chi squared random variable with $(n-1)$ degrees of freedom, we get $$\begin{align*} \text{Var}~\frac{(n-1)S^2}{\sigma^2} & = \text{Var}~\chi^{2}_{n-1} \\ \frac{(n-1)^2}{\sigma^4}\text{Var}~S^2 & = 2(n-1) \\ \text{Var}~S^2 & = \frac{2(n-1)\sigma^4}{(n-1)^2}\\ & = \frac{2\sigma^4}{(n-1)}, \end{align*}$$

where we have used that fact that $\text{Var}~\chi^{2}_{n-1}=2(n-1)$.

Hope this helps.

  • 12
    Remember that $(n-1)S^2/\sigma^2$ is only guaranteed to be $\chi^2$ when the sample is taken from a normal distribution, though.2011-10-16
  • 0
    Thanks Mike. I've edited my answer to reflect what you said in your comment.2011-10-16
  • 2
    The question posed is a general one, whereas the answer is distribution-specific. Not appropriate, I am afraid.2013-04-26
  • 1
    @afsdfdfsaf Perhaps, you should ask that as a separate question.2014-04-12
  • 4
    The answer is extremely useful, but would have been even more useful if someone could reference why (n−1)S2/σ2 is a Chi squared2015-03-09
  • 0
    I am trying to check this result with a very simple simulation in Matlab, but something seems inconsistent. See http://math.stackexchange.com/questions/1427877/variance-of-variance-estimation-test . Is this expression for the variance of the estimator truly valid?2015-09-09
  • 0
    Where can you get the value of $\sigma^2$?2018-10-11
4

There can be some confusion in defining the sample variance ... 1/n vs 1/(n-1). The OP here is, I take it, using the sample variance with 1/(n-1) ... namely the unbiased estimator of the population variance, otherwise known as the second h-statistic:

h2 = HStatistic[2][[2]]  

These sorts of problems can now be solved by computer. Here is the solution using the mathStatica add-on to Mathematica. In particular, we seek the Var[h2], where the variance is just the 2nd central moment, and express the answer in terms of central moments of the population:

CentralMomentToCentral[2, h2]  

enter image description here

We could just as easily find, say, the 4th central moment of the sample variance, as:

CentralMomentToCentral[4, h2] 

enter image description here