3
$\begingroup$

How to do the derivative \begin{equation} \frac{ \partial {\mathrm{tr}(XX^TXX^T)}}{\partial X}\quad ? \end{equation}

I have no idea where to start.

  • 0
    @Rein, the derivative is really the "linear part" of a function, so if you want to take the derivative at $A$, write $X = A+H$, and capture the linear part involving $H$.2012-12-14

3 Answers 3

1

By definition the derivative of $F(X)=tr(XX^TXX^T)$, in the point $X$, is the only linear functional $DF(X):{\rm M}_{n\times n}(\mathbb{R})\to \mathbb{R}$ such that
$ F(x+H)=F(X)+DF(X)\cdot H+r(H) $ with $\lim_{H\to 0} \frac{r(H)}{\|H\|}=0$. Let's get $DF(X)(H)$ and $r(H)$ by the expansion of $F(X+H)$. But first we must do an algebraic manipulation to expand $(X+H)(X+H)^T(X+H)(X+H)^T$. In fact, \begin{align} (X+\color{red}{H})(X+\color{red}{H})^T(X+\color{red}{H})(X+\color{red}{H})^T =& (X+\color{red}{H})(X^T+\color{red}{H}^T)\big(XX^T+X\color{red}{H}^T+\color{red}{H}X^T+\color{red}{H}\color{red}{H}^T\big) \\ =&(X+\color{red}{H})\Big(X^TXX^T+X^TX\color{red}{H}^T+X^T\color{red}{H}X^T+X^T\color{red}{H}\color{red}{H}^T \\ &\hspace{12mm}+\color{red}{H}^TXX^T+\color{red}{H}^TX\color{red}{H}^T+\color{red}{H}^T\color{red}{H}X^T+\color{red}{H}^T\color{red}{H}\color{red}{H}^T\Big) \\ =&\;\;\;\;\,XX^TXX^T+XX^TX\color{red}{H}^T+XX^T\color{red}{H}X^T+XX^T\color{red}{H}\color{red}{H}^T \\ &+X\color{red}{H}^TXX^T+X\color{red}{H}^TX\color{red}{H}^T+X\color{red}{H}^T\color{red}{H}X^T+X\color{red}{H}^T\color{red}{H}\color{red}{H}^T \\ &+\color{red}{H}X^TXX^T+\color{red}{H}X^TX\color{red}{H}^T+\color{red}{H}X^T\color{red}{H}X^T+\color{red}{H}X^T\color{red}{H}\color{red}{H}^T \\ &+\color{red}{H}\color{red}{H}^TXX^T+\color{red}{H}\color{red}{H}^TX\color{red}{H}^T+\color{red}{H}\color{red}{H}^T\color{red}{H}X^T+\color{red}{H}\color{red}{H}^T\color{red}{H}\color{red}{H}^T \end{align} Extracting $XX^TXX^T$ and the portions where $H$ or $H^T$ appears only once and applying $tr$ we have \begin{align} F(X+H)=&tr\Big( (X+\color{red}{H})(X^T+\color{red}{H}^T)(X+\color{red}{H})(X^T+\color{red}{H}^T) \Big) \\ =&\underbrace{tr \big(XX^TXX^T\big)}_{F(X)} +\underbrace{tr\big( XX^TX\color{red}{H}^T+XX^T\color{red}{H}X^T +X\color{red}{H}^TXX^T+\color{red}{H}X^TXX^T \big)}_{DF(X)\cdot H} \\ &+tr\Big(XX^T\color{red}{H}\color{red}{H}^T +X\color{red}{H}^TX\color{red}{H}^T+X\color{red}{H}^T\color{red}{H}X^T+X\color{red}{H}^T\color{red}{H}\color{red}{H}^T \\ &\hspace{12mm}+\color{red}{H}X^TX\color{red}{H}^T+\color{red}{H}X^T\color{red}{H}X^T+\color{red}{H}X^T\color{red}{H}\color{red}{H}^T \\ &\underbrace{\hspace{12mm}+\color{red}{H}\color{red}{H}^TXX^T+\color{red}{H}\color{red}{H}^TX\color{red}{H}^T+\color{red}{H}\color{red}{H}^T\color{red}{H}X^T+\color{red}{H}\color{red}{H}^T\color{red}{H}\color{red}{H}^T\Big)}_{r(H)} \end{align} Here $\|H\|=\sqrt{tr(HH^T)}$ is the Frobenius norm and $\displaystyle\lim_{H\to 0}\frac{r(H)}{H}=0$. Then the total derivative is \begin{align} \mathcal{D}F(X)\cdot H = & tr\bigg(XX^TXH^T\bigg)+ tr\bigg(XX^THX^T\bigg) \\ + & tr\bigg(XH^TXX^T \bigg)+ tr\bigg(HX^TXX^T \bigg). \\ \end{align}

The directional derivative is $ \frac{\partial}{\partial V}F(X)=\mathcal{D}F(X)\cdot V $ and the partial derivative is $ \frac{\partial}{\partial E_{ij}}F(X)=\mathcal{D}F(X)\cdot E_{ij}. $ Here $E_{ij}=[\delta_{ij}]_{n\times m}$.

4

Write it down in terms of components. You want to know $\frac{\partial}{\partial X_{ij}} \mathop{\rm tr}(X X^T X X^T) =\frac{\partial}{\partial X_{ij}} ( X_{kl} X_{ml} X_{mn} X_{kn}), $ where summation over repeated indices is implied. Using the fact that $\frac{\partial}{\partial X_{ij}} X_{mn} =\delta_{im} \delta_{jn}$ yields $\begin{align}\frac{\partial}{\partial X_{ij}} \mathop{\rm tr}(X X^T X X^T) &= \delta_{ik} \delta_{jl} X_{ml} X_{mn} X_{kn} + \delta_{im} \delta_{jl} X_{kl} X_{mn} X_{kn} + \delta_{im} \delta_{jn} X_{kl} X_{ml} X_{kn} + \delta_{ik} \delta_{jn} X_{kl} X_{ml} X_{mn} \\ &= X_{mj} X_{mn} X_{in}+X_{kj} X_{in} X_{kn} + X_{kl} X_{il} X_{kj} +X_{il} X_{ml} X_{mj}\\ &=4(X X^T X)_{ij} . \end{align}$

Or in short notation $\frac{\partial}{\partial X} \mathop{\rm tr}(X X^T X X^T) = 4 X X^T X.$

  • 0
    @Elias: I did not assume any of those. Or where do you spot that I assume something like that?2012-12-14
2

Define a new matrix variable $M=XX^T$ and write the function in terms of this new variable and the double-dot (aka Frobenius) product. When written in this form, finding the differential and gradient is easy $\eqalign{ f &= M:M \cr \cr df &= 2\,M:dM \cr &= 2\,M:(dX\,X^T+X\,dX^T) \cr &= 2\,(M+M^T):dX\,X^T \cr &= 4\,MX:dX \cr \cr \frac{\partial f}{\partial X} &= 4\,MX \cr &= 4\,XX^TX \cr \cr }$ For reference, the double-dot product is defined such that $A:B=\operatorname{tr}(A^TB)$