8
$\begingroup$

Hello everyone and happy new year! May all your hopes and aspirations come true and the forces of evil be confused and disoriented on the way to your house.

With that out of the way...

I am trying to write a computer code that gets a vector $\mu \in R^n $ and matrix $\Sigma \in \mathbb R^{n \times n}$ and generates random samples from the multivariate normal distribution with mean $\mu$ and covariance $\Sigma$.

The problem: I am only allowed to use the program to sample from the single variable normal distribution with mean $0$ and variance $1$: $N(0, 1)$.

The proposed solution: Define a vector of zeros (initially) $v \in \mathbb R^n$, now for all $i$ from $1$ to $n$, draw from a single variable normal dist: $v_i \overset{}{\sim} N(0, 1)$.

Now do a Cholesky decomposition on $\Sigma$: $\Sigma = LL^T$.

Now finally the random vector we want that is distributed from the multivariate gaussian is $Lv + \mu$.

My question is why? I don't understand the intuition, if it was a single dimensional distribution $N(\mu, \sigma^2)$ then I understand why $\sigma ^2 v + \mu$ is a good idea, so why cholesky? Wouldn't we want $\Sigma v + \mu$?

  • 0
    There is a flaw in your understanding: In the single-dimensional case our random sample is $\sigma v+\mu$, not $\sigma^2v + \mu$. In the multivariate case $\Sigma$ plays the role of $\sigma^2$.2017-01-01
  • 0
    Even so, why would the "root" of the covariance be cholesky? I can see why that seems similar but I think it demands an explanation. What if there is another matrix $A$ that is not $L$ such that $AA^T = \Sigma$? Why wouldn't that be a good fit rather than $L$?2017-01-01

2 Answers 2

6

After the comment of Rahul you understood that in any parametrization $x=Av+μ$ you will need that $$ Σ=\Bbb E(x-μ)(x-μ)^T=A·\Bbb E(vv^T)·A^T=AA^T. $$ There are infinitely many possibilities to chose $A$, with any orthogonal matrix $Q$ also $\tilde A=AQ$ satisfies that condition.

One could even chose the square root of $Σ$ (which exists and is unique among the s.p.d. matrices).

The advantage of using the Cholesky factorization is that you have a cheap and easy algorithm to compute it.

1

If all the variables in the multivariate gaussian were independent, we would have faced no issue but to use the formula $X_i =\sigma_i \nu+\mu_i $. Since they are correlated, we have (for example, bivariate case), $X_1 = \sigma_1\nu_1+\mu_1$ and $X_2 = \sigma_2[\rho\nu_1+\sqrt{1-\rho_{12}^2}\nu_2]+\mu_2$ and can be extended further to N. Note: $$\sum = \begin{bmatrix}\sigma_1^2 &\rho_{12} \sigma_1\sigma_2 &\rho_{13} \sigma_2\sigma_3 & \dots \\\\\rho_{12} \sigma_1\sigma_2 &\sigma_2^2 &\rho_{23} \sigma_2\sigma_3 & \dots\end{bmatrix}$$ By decomposing through Cholesky $\sum=LL^T$, we can get our $X = L\nu+\mu$ without manual calculations which are otherwise quite tedious for higher order.