16
$\begingroup$

Let $X$ be a random variable. Then its variance (dispersion) is defined as $D(X)=E((X-E(X))^2)$. As I understand it, this is supposed to be a measure of how far off from the average we should expect to find the value of $X$.

This would seem to suggest that the natural generalization of variance to the case where $X = (X_1,X_2,\ldots,X_n)$ is random vector, should be $D(X)=E((X-E(X))^T(X-E(X)))$. Here vectors are understood to be columns, as usual. This generalization would again, quite naturally, measure how far off from the average (expectation) we can expect to find the value of vector $X$.

The usual generalization, however, is $D(X)=E((X-E(X))(X-E(X))^T)$, the variance-covariance matrix which, as I see it, measures the correlation of components.

Why is this the preferred generalization? Is $E((X-E(X))^T(X-E(X)))$ also used and does it have a name?

The variance-covariance matrix does seem to contain more information. Is this the main reason or is there something deeper going on here?

  • 0
    Hmmm ... Yes, it does seem immensely more useful for studying inter-relations between random variables. But do probabilists never study random vectors as such? The quantity seems natural enough to deserve a name of its own, I think =) Anyway, thanks for the suggestions so far.2012-01-16

2 Answers 2

12

If $X \in \mathbb{R}^{n\times1}$ is a column-vector-valued random variable, then $V=E((X-E(X))(X-E(X))^T)$ is the variance of $X$ according to the definition given in Feller's famous book. But many authors call it the covariance matrix because its entries are the covariances between the scalar components of $X$.

It is the natural generalization of the $1$-dimensional case. For example, the $1$-dimensional normal distribution has density proportional to $ \exp\left( \frac{-(x-\mu)^2}{2\sigma^2} \right) $ where $\sigma^2$ is the variance. The multivariate normal has density proportional to $ \exp\left( -\frac12 (x-\mu)^T V^{-1} (x-\mu) \right) $ with $V$ as above.

The variance satisfies the identity $ \operatorname{var}(AX) = A\Big(\operatorname{var}(X)\Big) A^T. $ The matrix $A$ need not be $n\times n$. It could be $k\times n$, so that $AX$ is $k\times1$ and then both sides of this identity are $k\times k$.

It follows from the (finite-dimensional) spectral theorem that every non-negative-definite real matrix is the variance of some random vector.

Look at these:

The last-listed article above has a very elegant argument. The trick of considering a scalar to be the trace of a $1\times 1$ matrix is very nice.

  • 0
    The multivariate central limit theorem is certainl$y$ one way in which the multivariate normal distribution arises.2012-01-16
7

The covariance matrix has more information, indeed: it has the variance of each component (in the diagonal), and also the cross-variances. Your value is the sum of the variances of each component. This is not often a very useful measure. For one thing, the components might correspond to entirely different magnitudes, and hence it would little or no sense to sum the variances of each one. Think for example $X = (X_1,X_2,X_3)$ where $X_1$ height of a man, measured in meters, $X_2$ his waist circumference in centimeters, $X_3$ his weight, in kilograms...

  • 0
    Your value can be useful, what is hardly useful is to give it a special name, instead of just "sum of variances", or "total energy/power" or "trace of the covariance matrix". See eg the convergence analysis of LMS http://en.wikipedia.org/wiki/Least_mean_squares_filter2012-01-16