4
$\begingroup$

The problem is $\frac {\partial \mathrm{tr}(Q^TQAQ^TQA)}{\partial q_i}$, where $Q=[q_1,...,q_N]$, $q_i$ is $N$ dimensional vector and $Q$ is $N\times N$ matrix.

I have think of using chain rule, but I am confusing on using chain rule on matrix calculus.

For example if we let $X=Q^TQA$, the problem becomes $\frac {\partial \mathrm{tr}(X^2)}{\partial q_i}$, if I use the chain rule in scalar differentiation it will becomes $\frac {\partial \mathrm{tr}(X^2)}{\partial X} $ $\frac {\partial X}{\partial q_i}$ and it seems to be invalid.

  • 0
    @Fabian They are columns of $Q$2012-12-28

3 Answers 3

1

I do not get the argument why the chain rule is invalid. You can in fact proceed using the chain rule. For such questions is it usually easiest to write everything out in components. So first, we need $\partial_{X_{mn}} \mathop{\rm tr}X^2 =\partial_{X_{mn}} X_{ij}X_{ji} = \delta_{mi} \delta_{nj} X_{ji}+ \delta_{mj} \delta_{ni} X_{ij} = 2 X_{nm}; $ here, we have assumed that all indices which appear twice are summed over and used the fact that $\partial_{X_{mn}} X_{ij} = \delta_{mi} \delta_{nj}$.

Next, we need $\begin{align}\partial_{(q_i)_j} X_{mn} &=\partial_{(q_i)_j} Q_{km} Q_{kl} A_{ln} =\partial_{(q_i)_j} (q_m)_k (q_l)_k A_{ln} = \delta_{im} \delta_{jk} (q_l)_k A_{ln} + \delta_{il} \delta_{jk} (q_m)_k A_{ln}\\ &=(q_l)_j A_{ln} +(q_m)_j A_{in}\\ &= Q_{jl} A_{ln} + Q_{jm} A_{in} \end{align}$ because $Q_{mn} = (q_n)_m$.

In conclusion, we have $\begin{align}\partial_{(q_i)_j} \mathop{\rm tr}(Q^TQAQ^TQA) &= \partial_{X_{mn}} \mathop{\rm tr}X^2 \partial_{(q_i)_j} X_{mn} = 2 X_{mn} [ Q_{jl} A_{ln} + Q_{jm} A_{in}]\\ &= 2 Q_{km} Q_{kl} A_{ln}^2 Q_{jl} + 2Q_{jm} Q_{km} Q_{kl} A_{ln} A_{in}\\ &= 2(Q Q^T Q B \mathop{\rm tr} (A A^T) + Q Q^T Q A A^T)_{ji} \end{align}$ with $(B)_{ij}= 1$ the constant unit matrix.

  • 0
    You inexplicably dropped $\delta_{im}$ in your expansion of $\partial_{(q_i)_j} X_{mn}$. That result should read $(\delta_{im} Q_{jl} A_{ln} + Q_{jm} A_{in})$. Which changes the conclusion to $2(A^T Q^T Q A^T + A Q^T Q A) Q^T$.2014-02-18
1

Let's a function $F:M_{N\times N}(\mathbb{R})\to \mathbb{R}$ definid by $F(Q)=\mathrm{tr}\Big(Q^TQAQ^TQA\Big)$ then $ \frac{\partial}{\partial q_i}\mathrm{tr}\Big(Q^TQAQ^TQA\Big)=\mathcal{D} F(Q)\cdot[0\ldots q_i\ldots 0]. $ Here $ [0\ldots q_i\ldots 0] = \begin{pmatrix} 0&\dots &q_{1i}&\dots & 0 \\ \vdots & \cdots & \vdots & \cdots & \vdots \\ 0&\dots &q_{ii}&\dots &0 \\ \vdots & \cdots & \vdots & \cdots & \vdots \\ 0&\dots &q_{Ni}&\dots &0 \\ \end{pmatrix} \mbox{ and } q_i= \begin{pmatrix} q_{1i} \\ \vdots \\ q_{ii} \\ \vdots \\ q_{Ni} \end{pmatrix} $ and $\mathcal{D}F(Q_0): M_{N\times N}(\mathbb{R})\to \mathbb{R}$ is the total derivative of $F$ at $Q_0$, i.e. $ F(Q_0+V)=F(Q_0)+\mathcal{D}F(Q_0)\cdot V+ \|V\|\cdot\rho(V),\quad \lim_{V\to 0}\frac{\rho(V)}{\|V\|}=0\quad \mbox{ and }\|V\|=\sqrt{tr(V^TV)} $ Note that \begin{align} F(Q+V) = & tr\Big([Q+V][Q+V]^TA[Q+V]^T[Q+V]A\Big) \\ = & tr(QQ^TAQ^TQA)+ \\ & \\ & +tr(VQ^TAQ^TQA)+tr(QV^TAQ^TQA)+ \\ & +tr(QQ^TAV^TQA)+tr(QQ^TAQ^TVA)+ \\ & \\ & +tr(VV^TAQ^TQA)+ \\ & +tr(VQ^TAV^TQA)+tr(VQ^TAQ^TVA)+ \\ & +tr(QV^TAV^TQA)+tr(QV^TAQ^TVA)+ \\ & +tr(QQ^TAV^TVA)+ \\ & \\ & +tr(QV^TAV^TVA)+tr(VQ^TAV^TVA)+ \\ & +tr(VV^TAQ^TVA)+tr(VV^TAV^TQA)+ \\ & \\ & +tr(VV^TAV^TVA) \end{align} implies \begin{align} \mathcal{D}F(Q)\cdot V= & tr(VQ^TAQ^TQA)+tr(QV^TAQ^TQA) \\ & +tr(QQ^TAV^TQA)+tr(QQ^TAQ^TVA). \end{align} Play the matrix $V$ by matrix$[0,\ldots,q_i,\ldots,0]$ and calculate the answer. Good Look.