Let $X$ be a random variable. Then its variance (dispersion) is defined as $D(X)=E((X-E(X))^2)$. As I understand it, this is supposed to be a measure of how far off from the average we should expect to find the value of $X$.
This would seem to suggest that the natural generalization of variance to the case where $X = (X_1,X_2,\ldots,X_n)$ is random vector, should be $D(X)=E((X-E(X))^T(X-E(X)))$. Here vectors are understood to be columns, as usual. This generalization would again, quite naturally, measure how far off from the average (expectation) we can expect to find the value of vector $X$.
The usual generalization, however, is $D(X)=E((X-E(X))(X-E(X))^T)$, the variance-covariance matrix which, as I see it, measures the correlation of components.
Why is this the preferred generalization? Is $E((X-E(X))^T(X-E(X)))$ also used and does it have a name?
The variance-covariance matrix does seem to contain more information. Is this the main reason or is there something deeper going on here?