11
$\begingroup$

I googled around and searched inside the forum but I'm still confused about a problem.

I have 2 matrix functions $f,g : \mathbb{R}^{n \times n} \times \mathbb{R}^{a \times b} \rightarrow \mathbb{R}^{n \times n}$. Starting from this, I have the following expression:

$ t(Q, X, Y) = \text{tr}(f(g(Q, X),Y))$

where $\text{tr}$ is the trace operator and $X, Y \in \mathbb{R}^{a \times b}$ and $Q \in \mathbb{R}^{n \times n}$.

How do I evaluate $\frac{\partial t(Q,X, Y)}{\partial X}$ and $\frac{\partial t(Q,X, Y)}{\partial Y}$?

I mean, I would like to know how to correctly apply the chain rule.

* Addition *

I will try to give more information about my problem. Suppose that $a = b = n$ and that $f(A,B) = AB$ and $g(A,B) = BA + AB$ (actually this is only an example of possible functions $f$ and $g$). Then I have that:

$f(g(Q,X),Y) = f(XQ + QX, Y) = XQY + QXY$

Then, using matrix calculus (hoping there are no error!), I have that:

$ \frac{\partial t(Q,X,Y)}{\partial X} = QY + YQ\\ \frac{\partial t(Q,X,Y)}{\partial Y} = XQ + QX$

I can easily compute the result if I know the form of $f$ and $g$. Notice that the derivatives I obtained are in a matrix form. But actually I need to deal with generic functions. And for this reason I need to use the chain rule. The problem is that the chain rule formulas I know are helpful to derive the derivative with respect to a certain element of the matrix $X$ (or $Y$). In this case, I'm not able to have a matrix form of the derivatives.

So, my question is... there is a chain rule formula I'm missing which let me describe these derivatives in a matrix form?

* Addition 2 *

The chain rule formulas that I know are reported here http://en.wikipedia.org/wiki/Matrix_calculus#Scalar-by-matrix_identities (see the 7th row of the table)

  • 0
    I'm going to try to give more details in my question2012-12-19

1 Answers 1

3

Let's use uppercase letters for the matrix variables, so they're easy to distinguish from the lowercase scalars $\eqalign{ G &= G(Q,X) \cr F &= F(G,Y) \cr t &= {\rm tr}(F) = I:F \cr dt &= I:dF \cr }$ First, let's calculate the differential and gradient wrt $Y$ $\eqalign{ dt &= I:\Big(\frac{\partial F}{\partial Y}:dY\Big) \cr \frac{\partial t}{\partial Y} &= I:\frac{\partial F}{\partial Y} \cr }$ And now wrt $X$ $\eqalign{ dt &= I:\Big(\frac{\partial F}{\partial G}:\frac{\partial G}{\partial X}:dX\Big) \cr \frac{\partial t}{\partial X} &= I:\frac{\partial F}{\partial G}:\frac{\partial G}{\partial X} \cr\cr }$ Note that the matrix-by-matrix gradients are 4th order tensors. For example, here is one of the gradients in component form $\eqalign{ \Big(\frac{\partial G}{\partial X}\Big)_{ijkl} = \frac{\partial G_{ij}}{\partial X_{kl}}\cr\cr }$

Also note that colons are used to denote the double-contraction product, e.g. $\Big(\frac{\partial F}{\partial G}:\frac{\partial G}{\partial X}\Big)_{ijkl} = \frac{\partial F_{ij}}{\partial G_{mn}}\,\frac{\partial G_{mn}}{\partial X_{kl}}$