11
$\begingroup$

I'm a bit stumped by the exponential family representation of a multi-variate Gaussian distribution. Basically, the exponential form is a generic form for a large class of probability distributions. The standard form is

$$f_X(x) = \exp[\theta' T(x) + F(\theta)]$$

where $\theta$ is a set of parameters (based on $\mu$ and $\Sigma$), $T(x)$ is a vector of sufficient statistics, and $F$ is a function of the parameters that ensures the distribution is a pdf, i.e., sums to one. For more information on this form, see http://www.cs.columbia.edu/~jebara/4771/tutorials/lecture12.pdf, http://en.wikipedia.org/wiki/Exponential_family, etc.

The "conversion" for a multi-variate Gaussian distribution to exponential family form is listed as

$$\theta = [\Sigma^{-1}\mu, -\frac{1}{2}\Sigma^{-1}]'$$ $$T(x) = [x, x x']'$$

but this is confusing because the outer product $x x'$ is a matrix and $-\frac{1}{2}\Sigma^{-1}$ is also a matrix. Thus, it seems the product between $\theta$ and $T(x)$ should result in a scalar "entry" and a matrix "entry". Obviously, this expression needs to evaluate to a scalar.

The inner product works fine in the scalar case, and I understand this conversion is computed by manipulating to the quadratic form $x'Ax + b'x$. Still, it seems that I am completely missing something here. Thanks for your help.

  • 1
    Consider the "vectorized" version of $xx'$, then you'll see it matches the form. Or, just write it out in summation notation. Let $S = \Sigma^{-1}$ and $s_{ij}$ be the $(i,j)$th element of $S$. Define $y_{ij} = x_i x_j$. Let $u_i$ be the $i$th element of $S \mu$. Then the exponent of the density is $-\frac{1}{2} \sum_i \sum_j s_{ij} y_{ij} + \sum_i u_i x_i + h(S,u)$ which has the required form. You need to know that the the $y_{ij}$ and $x_i$ are linearly independent as well as the $s_{ij}$ and $u_i$ (so that you know you're not dealing with a *curved* exponential family. That's not hard.2011-03-13
  • 0
    @cardinal Indeed, *That's not hard*, but @RandomGuy is right to point out there is at least an abuse of notation here, since what is *written* is a matrix product and what is *meant* is the scalar product of two $n\times n$ matrices transformed into vectors of size $n^2$ (as you aptly explain). If only every question on MSE could be as relevant as this one!2011-03-13
  • 0
    @Didier, @RandomGuy, sorry I ran out of characters in my comment. My last sentence was supposed to say: "This last part is not hard to show." I truncated to meet the character restrictions and it came out sounding much more flippant. Sorry about that. It's a good question @RandomGuy. If you throw a trace around the matrix product the author's original comment makes more sense. I wonder if that's what he meant.2011-03-13
  • 0
    Thank you for your answer. I see, it is fairly obvious once you understand the notation. If you convert your comment to an "answer", I'd be happy to accept it as the answer to this question.2011-03-20
  • 0
    @cardinal +1 to make it a formal answer, I just found this thread when wondering about the very same thing.2015-06-04

1 Answers 1