4
$\begingroup$

I have been stuck with the following derivative for some time: $ \frac{\partial\,\mathbf{b}^\mathrm{T}(\mathbf{X}\mathbf{C}\mathbf{X}^\mathrm{T})^{-1}\mathbf{b}}{\partial\,\mathbf{X}} $, where $\mathbf{b}\in\mathbb{R}^{M\times1}$, $\mathbf{X}\in\mathbb{R}^{M\times N}$ and $\mathbf{C}\in\mathbb{R}^{N\times N}$ and $\mathbf{C}$ is symmetric.

I had a look in the Matrix Cookbook, but I am still not sure how to deal with the inverse of a matrix in the second order form. Is it correct to apply the chain rule? $\frac{\partial\,\mathbf{b}^\mathrm{T}(\mathbf{X}\mathbf{C}\mathbf{X}^\mathrm{T})^{-1}\mathbf{b}}{\partial\,\mathbf{X}} = \frac{\partial\,\mathbf{b}^\mathrm{T}(\mathbf{X}\mathbf{C}\mathbf{X}^\mathrm{T})^{-1}\mathbf{b}}{\partial\,\mathbf{XCX}^\mathrm{T}}\cdot \frac{\partial \, \mathbf{XCX}^{\mathrm{T}}}{\partial \, \mathbf{X}}.$

In this case, the first partial derivative will be: $ \frac{\partial\,\mathbf{b}^\mathrm{T}(\mathbf{X}\mathbf{C}\mathbf{X}^\mathrm{T})^{-1}\mathbf{b}}{\partial\,\mathbf{XCX}^\mathrm{T}} = -(\mathbf{X}\mathbf{C}\mathbf{X}^\mathrm{T})^\mathrm{-T}\mathbf{b}\mathbf{b}^\mathrm{T}(\mathbf{X}\mathbf{C}\mathbf{X}^\mathrm{T})^{-\mathrm{T}} $ (using Eq. 55, from 1). The second part, $\frac{\partial \, \mathbf{XCX}^{\mathrm{T}}}{\partial \, \mathbf{X}}$, will be similar to a fourth-rank tensor. How can I arrive at a result that is a $M\times N $ matrix?

I would really appreciate if someone could help me with this or provide some piece of advice.

2 Answers 2

5

Setting $D = X C X^T$ we use (53) from Matrix Cookbook:

$\frac{\partial\,D^{-1}}{\partial \, x_{ij}} = - D^{-1} \frac{\partial\,D}{\partial \, x_{ij}} D^{-1} $

Besides, formula (72) tell us that

$ \frac{\partial \,( X C X^T )}{\partial \, x_{ij}} = X C J^{ij} + J^{ji} C X^T $

(where $J^{ij}$ is the "singleton matrix", with 1 in position $(i,j)$, zero elsewhere).

So

$ \frac{\partial \, b^T (X C X^T)^{-1} b }{\partial \, x_{ij}} = - b^T D^{-1} (X C J^{ij} + J^{ji} C X^T ) D^{-1} b = -2 u^T X C J^{ij} u $

where $u= D^{-1}b$ , and we've used the fact that $C$ is symmmetric -and hence also is $D$. Now formula (431) says $ u^T A J^{ij} B u = A^T u u^T B^T|_{i,j}$, hence the RHS is equal to

$ -2 C X^T u u^T |_{i,j}$

So

$\frac{\partial \, b^T (X C X^T)^{-1} b }{\partial \, X} = -2 C X^T u u^T = - 2 C X^T (X C X^T)^{-1} b \, b^T (X C X^T)^{-1} $

  • 0
    Thank you very much for the answer and for the derivation!2011-06-26
1

According to formula (72) in matrix cookbook, $ \frac{\partial (XCX^T)}{\partial X} =XCJ^{ji} + J^{ij}CX^T$

Then according to my knowledge, the final answer becomes transpose of $-2CX^T uu^T$. This may be a way as i was deriving the derivative w.r.t $(M\times N)$ matrix also a $(M\times N)$ matrix.