0
$\begingroup$

I'm learning multivariate analysis. I am asked to calculate covariance of $X=\begin{pmatrix} 3&7 \\ 2&4 \\ 4&7 \end{pmatrix}$

According to P8 of Applied Multivariate Statistical Analysis written by Richard A. Johnson,

$s_{ik}=\frac{1}{n}\sum^{n}_{j=1}(s_{ji}-\bar{x}_i)(s_{jk}-\bar{x}_k)$ $i=1,2,\ldots,p$ , $k=1,2,\ldots,p$.

However, when I using R to compute covariance. It is following this formula $s_{ik}=\frac{1}{n-1}\sum^{n}_{j=1}(s_{ji}-\bar{x}_i)(s_{jk}-\bar{x}_k) $

I do not know why they are difference? How to determine when to use $\frac{1}{n}$ or $\frac{1}{n-1}$ ?

  • 0
    Please use `\sum` for sums, instead of `\Sigma`.2012-11-24

2 Answers 2

2

The use of $n-1$ rather than $n$ is Bessel's correction.

To find the variance of a probability distribution that puts probability $1/n$ at each of $n$ points, you use $1/n$, not $1/(n-1)$. The denominator $n-1$ is used ONLY when estimating a population variance based on a sample variance. It makes the estimator unbiased.

Unbiasedness is slightly overrated. You get a smaller mean squared error with the biased estimator in which the denominator is $1/n$, and smaller still (in fact smallest possible) when it's $1/(n+1)$.

2

Both estimators are consistent. The estimator with $1/(n-1)$ is unbiased.

http://en.wikipedia.org/wiki/Bias_of_an_estimator#Sample_variance