3
$\begingroup$

I want to calculate the Covariance matrix of an n-dimensional normal distribution given by $Y=AX+a$ where $X=(X_1,...,X_n)$ with each $X_i$ a standard normal distribution.

I have calculated the density of $Y$ as $$f(y)=\frac{1}{(2\pi)^{\frac{n}{2}}|det(A)|}e^{-\frac{1}{2}(y-a)^{T}(AA^{T})^{-1}(y-a)}$$ which according to my notes is correct. Wikipedia has as PDF $$f(y)=\frac{1}{(2\pi)^{\frac{n}{2}}|\Sigma|^{-1/2}}e^{-\frac{1}{2}(y-a)^{T}\Sigma^{-1}(y-a)}$$

with covariance matrix $\Sigma$, from which I infer that I should have $\Sigma=AA^{T}$, i.e. my covariance matrix should be given by $AA^{T}$.

But doing the actual calculation I get as Covariance of the components $Y_k,Y_l$, with expectations $a_k, a_l$ respectively: $$Cov(Y_k,Y_l)=\mathbb{E}[(Y_k-a_k)(Y_l-a_l)]=\mathbb{E}[Y_kY_l-a_kY_l-a_lY_k+a_ka_l]=\mathbb{E}[Y_kY_l]-a_ka_l=\mathbb{E}[(AX+a)_k(AX+a)_l]-a_ka_l=\mathbb{E}[(X_1\sum_{i=1}^na_{ki}+a_k])(X_1\sum_{i=1}^na_{li}+a_l)]-a_ka_l=\mathbb{E}[X_1^2(\sum_{i=1}^na_{ki})(\sum_{i=1}^na_{li})+X_1(\sum_{i=1}^na_{ki})+X_1(\sum_{i=1}^na_{li})+a_ka_l]-a_ka_l=\mathbb{E}[X_1^2](\sum_{i=1}^na_{ki})(\sum_{i=1}^na_{li})=(\sum_{i=1}^na_{ki})(\sum_{i=1}^na_{li})$$ where in the last two steps I have used linearity of expectation and the fact that the components are standard normally distributed, i.e. $\mathbb{E}[X_1]=0$ and $\mathbb{E}[X_1^2]=1$.

However, this isn't equal to $(AA^{T})_{kl}=\sum_{i=1}^{n}a_{ki}a_{li}$.

Does somebody see what I did wrong/what I am missing?

2 Answers 2

3

$$\mathbb{E}[(AX+a)_k (AX+a)_l] = \mathbb{E} \left[ \left( X_1 \sum_{i=1}^n a_{ki} + a_k \right) \left( X_1 \sum_{i=1}^n a_{li} + a_l \right) \right]$$

does not hold true. Instead it should read

$$\mathbb{E}[(AX+a)_k (AX+a)_l] = \mathbb{E} \left[ \left( \sum_{i=1}^n a_{ki} X_i + a_k \right) \left( \sum_{j=1}^n a_{lj} X_j + a_l \right) \right]. \tag{1}$$

Note that this makes a difference since the distribution of the vector $(X_1,X_1)$ does not equal the distribution of $(X_i,X_j)$ (this means that we cannot simply replace $X_i$ and $X_j$ in $(1)$ by $X_1$). Clearly, by $(1)$,

$$\begin{align*} \mathbb{E}[(AX+a)_k (AX+a)_l] &= \sum_{i=1}^n \sum_{j=1}^n a_{ki} a_{lj} \mathbb{E}(X_i X_j) + a_l \mathbb{E} \left( \sum_{i=1}^n a_{ki} X_i \right) \\ &\quad + a_k \mathbb{E} \left( \sum_{j=1}^n a_{lj} X_j \right) + a_k a_l \\ \end{align*}$$

Although it is not mentioned explicltly in your question, I take it that $X_1,\ldots,X_n$ are independent random variables. Using that $\mathbb{E}(X_i X_j) = 0$ for all $i \neq j$ and $\mathbb{E}(X_i)=0$ for all $i$, we get

$$\mathbb{E}[(AX+a)_k (AX+a)_l] = \sum_{i=1}^n a_{ki} a_{li} + a_k a_l = (A A^T)_{k,l} + a_k a_l.$$ Hence, $$\text{cov}(Y_k,Y_l) = (A A^T)_{k,l}.$$

  • 0
    Could you elaborate a bit more on why the first equation doesn't hold true. What does the vector $(X_i,X_j)$ have to do with that.2017-01-29
  • 0
    @see I simply used the very definition of $Y$. We have $$Y_k = \sum_{i} a_{ji} X_i + a_k,$$ right, and **not** $$Y_k = X_1 \sum_{i} a_{ji} + a_i.$$2017-01-29
  • 0
    Well, the actual thought process of getting there was of course $Y_k=\sum_ia_{ji}X_i+a_k=\sum_ia_{ji}X_1+a_k=X_1\sum_ia_{ji}+a_k$ since each $X_i$ has the same distribution, namely a univariate standard normal, i.e. they are all the same as $X_1$. How is this wrong?2017-01-29
  • 0
    @see Well, for instance $X_1+X_2$ has not the same distribution as $X_1+X_1 = 2X_1$. You can see this, for instance, by calculating the variance of each random variable.2017-01-30
1

A much cleaner way is to do the calculation with vectors and matrices and exploiting linearity of expectation.

$E[Y] = E[AX+a] = A E[X] + a = A (0) +a = a$ by linearity of expectation.

Then, $cov(Y) = E[ (Y - E[Y]) (Y - E[Y])^T] = E[ (AX + a - a) (AX + a - a)^T] = E[(AX) (AX)^T] = E[A X X^T A^T ] = A E[X X^T] A^T$, where the last step follows by linearity of expectation.

Now, the $i,j$-th entry of $X X^T$ is $X_i X_j$. So, taking the expectation elementwise, and noting $E[X_i X_j]$ is $1$ if $i=j$ and $0$ otherwise, we see that $cov(Y) = A A^T$.

From this, you can easily see that so long as the components of $X$ are uncorrelated with unit variance and have mean zero, $Y$ still has the covariance matrix $A A^T$ -- Gaussian distribution not required.

  • 0
    Well yeah but I would have to prove $E[AX]=AE[X]$ first (which I can).2017-01-29
  • 0
    Yeah -- that's a tiny bit of work relative to the multiplications in your original post. Just write out the i,j-th entry of $AX$ and then use linearity of expectation on it.2017-01-29
  • 0
    Well it's most certainly 'less to carry', thanks for the tip.2017-01-29