15
$\begingroup$

I am trying to compute the derivative:$\frac{d}{d\boldsymbol{\mu}}\left( (\mathbf{x} - \boldsymbol{\mu})^\top\boldsymbol{\Sigma} (\mathbf{x} - \boldsymbol{\mu})\right)$where the size of all vectors ($\mathbf{x},\boldsymbol{\mu}$) is $n\times 1$ and the size of the matrix ($\boldsymbol{\Sigma}$) is $n\times n$.

I tried to break this down as $\frac{d}{d\boldsymbol{\mu}}\left( \mathbf{x}^\top\boldsymbol{\Sigma} \mathbf{x} - \mathbf{x}^\top\boldsymbol{\Sigma} \boldsymbol{\mu} - \boldsymbol{\mu}^\top\boldsymbol{\Sigma} \mathbf{x} + \boldsymbol{\mu}^\top\boldsymbol{\Sigma} \boldsymbol{\mu} \right) $

yielding $(\mathbf{x} + \boldsymbol{\mu})^\top\boldsymbol{\Sigma} + \boldsymbol{\Sigma}(\boldsymbol{\mu} - \mathbf{x})$

but the dimensions don't work: $1\times n + n\times 1$. Any help would be greatly appreciated.

-C

  • 0
    What is the broader context of this expression?2017-11-21

3 Answers 3

9

Think of when you compute the derivative with $n = 1$ : you get $(x-\mu)^{\top} \Sigma (x-\mu) = \sigma (x-\mu)^2$ for some constant $\sigma$ representing the matrix, thus derivative with respect to $\mu$ gives $2 \sigma(\mu-x)$. This is what happens in dimension $n$ : the derivative of this function is the gradient seen as a function of $\mu$. Write $x = (x_1, \dots, x_n)$, $\mu = (\mu_1, \dots, \mu_n)$ and $\Sigma = (\sigma_{ij})$. Then $ f(\mu) = (x-\mu)^{\top} \Sigma (x-\mu) = \sum_{i=1}^n \sum_{j=1}^n (x_i - \mu_i)(x_j - \mu_j)\sigma_{ij}. $ Computing partial derivatives, say, with respect to the $k^{\text{th}}$ variable, with $1 \le k \le n$, you get $\begin{align*} \frac{\partial f}{\partial\mu_{k}} &= \sum_{i=1}^n \sum_{j=1}^n \left( -\delta_{ik} (x_j - \mu_j) \sigma_{ij} \right) + \left( (x_i -\mu_i) (-\delta_{jk}) \sigma_{ij} \right)\\ &= \sum_{j=1}^n (\mu_j - x_j) \sigma_{kj} + \sum_{i=1}^n (\mu_i - x_i) \sigma_{ik}, \end{align*}$ where $\delta_{ij} = 0$ if $i \neq j$ and $1$ if $i=j$. If you look at the vector $\nabla f = \left( \frac{ \partial f}{\partial \mu_1} , \dots, \frac{ \partial f}{\partial \mu_n}\right)$, you see that its components are precisely those of the vector $\Sigma(\mu-x) + \Sigma^{\top} (\mu-x)$. If the matrix $A$ is symmetric, you get $2\Sigma(\mu-x)$.

Hope that helps,

  • 0
    @PatrickDaSilva Could you maybe help me with this question: http://math.stackexchange.com/questions/606646/matrix-derivative-ax-btax-b ? Thanks a lot in advance!2013-12-16
8

There is a very short and quick way to calculate it correctly. The object $(x-\mu)^T\Sigma(x-\mu)$ is called a quadratic form. It is well known that the derivative of such a form is (see e.g. here),

$\frac{\partial x^TAx }{\partial x}=(A+A^T)x$

This works even if $A$ is not symmetric. In your particular example, you use the chain rule as,

$\frac{\partial (x-\mu)^T\Sigma(x-\mu) }{\partial \mu}=\frac{\partial (x-\mu)^T\Sigma(x-\mu) }{\partial (x-\mu)}\frac{\partial (x-\mu)}{\partial \mu}$

Thus,

$\frac{\partial (x-\mu)^T\Sigma(x-\mu) }{\partial (x-\mu)}=(\Sigma +\Sigma^T)(x-\mu)$

and

$\frac{\partial (x-\mu)}{\partial \mu}=-1$

Combining equations you get the final answer,

$\frac{\partial (x-\mu)^T\Sigma(x-\mu) }{\partial \mu}=(\Sigma +\Sigma^T)(\mu-x)$

3

In full technicality:

$\frac{\partial}{\partial u_k}\left(\sum_{i,j=1}^n \sigma_{ij}(x_i-u_i)(x_j-u_j)\right)=\sum_{i,j=1}^n\sigma_{ij}\left[-\delta_{ik}(x_j-u_j)-(x_j-u_j)\delta_{jk}\right]$

$=-\sum_{l=1}^n (\sigma_{kl}+\sigma_{lk})(x_l-u_l)=\left[(\Sigma+\Sigma^T)(\vec{u}-\vec{x})\right]_k.$

IOW you should have gotten a $\Sigma^T(u-x)$ instead of $(x+u)^T\Sigma$. Note $a^T\Phi b=b^T\Phi^T a$.

  • 0
    @Didier, Patrick: Sorry, let me explain. I sometimes write $a_i$ to *stand* for $\vec{a}$ (as $k$ ranges over its various indices). Apparently this is very particular to me...2012-01-03