11
$\begingroup$

I googled around and searched inside the forum but I'm still confused about a problem.

I have 2 matrix functions $f,g : \mathbb{R}^{n \times n} \times \mathbb{R}^{a \times b} \rightarrow \mathbb{R}^{n \times n}$. Starting from this, I have the following expression:

$$ t(Q, X, Y) = \text{tr}(f(g(Q, X),Y))$$

where $\text{tr}$ is the trace operator and $X, Y \in \mathbb{R}^{a \times b}$ and $Q \in \mathbb{R}^{n \times n}$.

How do I evaluate $\frac{\partial t(Q,X, Y)}{\partial X}$ and $\frac{\partial t(Q,X, Y)}{\partial Y}$?

I mean, I would like to know how to correctly apply the chain rule.

* Addition *

I will try to give more information about my problem. Suppose that $a = b = n$ and that $f(A,B) = AB$ and $g(A,B) = BA + AB$ (actually this is only an example of possible functions $f$ and $g$). Then I have that:

$$f(g(Q,X),Y) = f(XQ + QX, Y) = XQY + QXY$$

Then, using matrix calculus (hoping there are no error!), I have that:

$$ \frac{\partial t(Q,X,Y)}{\partial X} = QY + YQ\\ \frac{\partial t(Q,X,Y)}{\partial Y} = XQ + QX$$

I can easily compute the result if I know the form of $f$ and $g$. Notice that the derivatives I obtained are in a matrix form. But actually I need to deal with generic functions. And for this reason I need to use the chain rule. The problem is that the chain rule formulas I know are helpful to derive the derivative with respect to a certain element of the matrix $X$ (or $Y$). In this case, I'm not able to have a matrix form of the derivatives.

So, my question is... there is a chain rule formula I'm missing which let me describe these derivatives in a matrix form?

* Addition 2 *

The chain rule formulas that I know are reported here http://en.wikipedia.org/wiki/Matrix_calculus#Scalar-by-matrix_identities (see the 7th row of the table)

  • 0
    It might help to consider the function $h(Q,X,Y) = (g(Q,X),Y)$, whose differential is easily calculated. Then $t(Q,X,Y) = l \circ f \circ h (Q,X,Y)$, where $l$ is the trace2012-12-17
  • 0
    I know this formula (http://en.wikipedia.org/wiki/Matrix_calculus#Scalar-by-matrix_identities - it is the 7th formula into the table). This is performed on each $X_{i,j}$ separately! I would like to know the formula with respect to all $X$.2012-12-18
  • 0
    I'm going to try to give more details in my question2012-12-19

1 Answers 1

3

Let's use uppercase letters for the matrix variables, so they're easy to distinguish from the lowercase scalars $$\eqalign{ G &= G(Q,X) \cr F &= F(G,Y) \cr t &= {\rm tr}(F) = I:F \cr dt &= I:dF \cr }$$ First, let's calculate the differential and gradient wrt $Y$ $$\eqalign{ dt &= I:\Big(\frac{\partial F}{\partial Y}:dY\Big) \cr \frac{\partial t}{\partial Y} &= I:\frac{\partial F}{\partial Y} \cr }$$ And now wrt $X$ $$\eqalign{ dt &= I:\Big(\frac{\partial F}{\partial G}:\frac{\partial G}{\partial X}:dX\Big) \cr \frac{\partial t}{\partial X} &= I:\frac{\partial F}{\partial G}:\frac{\partial G}{\partial X} \cr\cr }$$ Note that the matrix-by-matrix gradients are 4th order tensors. For example, here is one of the gradients in component form $$\eqalign{ \Big(\frac{\partial G}{\partial X}\Big)_{ijkl} = \frac{\partial G_{ij}}{\partial X_{kl}}\cr\cr }$$

Also note that colons are used to denote the double-contraction product, e.g. $$\Big(\frac{\partial F}{\partial G}:\frac{\partial G}{\partial X}\Big)_{ijkl} = \frac{\partial F_{ij}}{\partial G_{mn}}\,\frac{\partial G_{mn}}{\partial X_{kl}}$$