I looked at matrix cook book and found an expression that is close $ \frac{\partial a^{T} X^{-1}b}{\partial X}=-X^{-T}ab^{T}X^{-T}$ . But it seems a and b are vectors. While in my case I have matrices. Any help is appreciated.
Derivative of $ \frac{\partial A^{T} X^{-1}A}{\partial X}$
-
3Well, you have the small problem that the derivative you're looking for is a tensor.. – 2017-01-06
-
0@Exodd Why is that a problem? He's looking for that tensor.... – 2017-01-06
-
0I answered a similar question giving some hints: http://math.stackexchange.com/questions/1096581/derivative-of-a-vector/1096655#1096655. I think it is useful in your case. – 2017-01-06
3 Answers
In general, I like to write these calculations in index form (using the Einstein convention that repeated indices imply a sum). In that notation, you'll have $ \frac{\partial A_{ij} (X^{-1})_{ik} A_{km}}{\partial X_{ab}} = A_{ij} \frac{\partial (X^{-1})_{ik}}{\partial X_{ab}} A_{km}$. You can look up (in Matrix cookbook among other places) the derivate that you need. Namely it is $\frac{\partial (X^{-1})_{ik}}{\partial X_{ab}} = -(X^{-1})_{ia}(X^{-1})_{bk}$. When you substitute that back in, you'll end up with a rank-4 tensor - which means that you'll have 4 indices that aren't summed over. It should get you back to the result that you stated when you make the matrix $A$ have one column.
Recall that the derivative of the map $\iota\colon X \mapsto X^{-1}$ is $$ D\iota(X)H = -X^{-1}HX^{-1} $$ Therefore, we have $$ \partial_X(-A^tX^{-1}A)H = A^tX^{-1}HX^{-1}A $$
-
0Thanks for your answer. Can you please clarify what is H ? – 2017-01-06
-
3The derivative is a linear map, but in the case of matrices, you cannot directly express the map in the form $x \mapsto Ax$, so one specifies the map in terms of its application. – 2017-01-06
Given $\mathrm A \in \mathbb R^{n \times m}$, let $\mathrm F : \mathbb R^{n \times n} \to \mathbb R^{m \times m}$ be defined by
$$\mathrm F (\mathrm X) := \mathrm A^{\top} \mathrm X^{-1} \mathrm A$$
Hence,
$$\begin{array}{rl} \mathrm F (\mathrm X + h \mathrm V) &= \mathrm A^{\top} \left( \mathrm X + h \mathrm V \right)^{-1} \mathrm A\\ &= \mathrm A^{\top} \left( \mathrm I_n + h \mathrm X^{-1} \mathrm V \right)^{-1} \mathrm X^{-1} \mathrm A\\ &= \mathrm A^{\top} \left( \mathrm I_n - h \mathrm X^{-1} \mathrm V \right) \mathrm X^{-1} \mathrm A + O(h^2)\\ &= \mathrm A^{\top} \mathrm X^{-1} \mathrm A - h \mathrm A^{\top} \mathrm X^{-1} \mathrm V \mathrm X^{-1} \mathrm A + O(h^2)\\ &= \mathrm F (\mathrm X) - h \mathrm A^{\top} \mathrm X^{-1} \mathrm V \mathrm X^{-1} \mathrm A + O(h^2)\end{array}$$
Thus, the directional derivative of $\mathrm F$ in the direction of $\mathrm V$ at $\mathrm X$ is $$D_{\mathrm V} \mathrm F (\mathrm X) = - \mathrm A^{\top} \mathrm X^{-1} \mathrm V \mathrm X^{-1} \mathrm A$$