Could someone explain this equation?
$ \frac{d \operatorname{tr}(AXB)}{d X} = BA $
I understand that
$ d\operatorname{tr}(AXB) = \operatorname{tr}(BA \; dX) $
but I don't quite understand how to move $dX$ out of the trace.
Could someone explain this equation?
$ \frac{d \operatorname{tr}(AXB)}{d X} = BA $
I understand that
$ d\operatorname{tr}(AXB) = \operatorname{tr}(BA \; dX) $
but I don't quite understand how to move $dX$ out of the trace.
Try expanding to linear order. This always eases the understanding:
$\operatorname{tr}(A (X+dX)B)=A_{ij} (X_{jk}+dX_{jk})B_{ki})$
where Einstein's summation rule is used. Substracting $\operatorname{tr}(AXB)$ you get
$\begin{align} d\operatorname{tr}(AXB)&=\operatorname{tr}(A(X+dX)B)-\operatorname{tr}(AXB)\\&=A_{ij} dX_{jk}B_{ki}=\underbrace{B_{ki}A_{ij}}_{=(BA)_{kj}} \; dX_{jk} \end{align}$
The notation is quite misleading (at least for me).
Hint:
Does it make sense that $\frac{\partial}{\partial X_{mn}} \mathop{\rm tr} (A X B) = (B A)_{nm}?$
More information: $\frac{\partial}{\partial X_{mn}} \mathop{\rm tr} (A X B) = \frac{\partial}{\partial X_{mn}} \sum_{jkl} A_{jk} X_{kl} B_{lj} = \sum_{jkl} A_{jk} \delta_{km} \delta_{nl} B_{lj} = \sum_{j} A_{jm} B_{nj} =(B A)_{nm}. $
The other answers are correct, but I feel like they missed the point. Arguments that take a basis to prove a result independent of bases should be approached with caution.
First of all, according to the Matrix Cookbook, the formula is $ \frac{\mathrm{tr}(AXB)}{dX} = (BA)^T,$ not the one given in your question.
What's confusing about this presentation is that $f (X) = \mathrm{tr}(AXB)$ is a linear map, so it's derivative (=linear approximation) is itself.
So in fact, the statement should read $ f(X) = \mathrm{tr}(AXB) = (BA)^T,$ which is clearly wrong.
But consider the Frobenius inner product on $\mathrm{Mat}(m, n)$. For $U, V \in \mathrm{Mat}(m, n)$:
$\langle U, V \rangle = \mathrm{tr}(U^T V).$
By the Riesz representation theorem, $f$ can be represented as
$f(X) = \langle U, X \rangle = \mathrm{tr}(U^TX).$
for a fixed $U \in \mathrm{Mat}(m, n)$.
Clearly $U = (BA)^T$ does the job, so the more precise statement is
$\mathrm{tr}(AXB) = \langle (BA)^T, X \rangle,$
which is a triviality.