3
$\begingroup$

I have found something about the product rules for matrix-functions in https://ccrma.stanford.edu/~dattorro/matrixcalc.pdf $$ \frac{d(f(x)^Tg(x))}{dx}=\frac{df(x)}{dx}\cdot g(x)+\frac{dg(x)}{dx}\cdot f(x) $$ I verify this in the example list in http://www.psi.toronto.edu/matrix/calculus.html. For example: $$ \frac{d (Ax+b)^TC(Dx+e)}{dx} = A^TC(Dx+e) + D^TC^T(Ax+b) $$ $$ \frac{d (x^TCx)}{dx} = (C+C^T)x $$

But when I met this one, I'm confused... $$ \frac{d (a^TX^TXb)}{dX} = X(ab^T + ba^T) $$ Following the formula in matrixcalc.pdf, I get this $$\begin{align} f(x)&=Xa\\ g(x)&=Xb\\ \frac{d (a^TX^TXb)}{dX} &= \frac{df(x)}{dx}\cdot Xb+\frac{dg(x)}{dx}\cdot Xa \\ &=a^TXb+b^TXa\end{align}$$ which is different with the correct result. I don't know what I'm doing wrong, please help me.. Thanks!

  • 0
    Pretty nearly unreadable. Maybe you want to consult http://meta.math.stackexchange.com/questions/5020/mathjax-basic-tutorial-and-quick-reference and/or http://meta.math.stackexchange.com/questions/1773/do-we-have-an-equation-editing-howto2012-10-01
  • 0
    In the first equation, do you mean: $$\frac{d(f(x)^T g(x))}{dx}=\frac{d(f(x))}{dx}\cdot g(x)+\frac{d(g(x))}{dx}\cdot f(x)$$ Can you please use $^T$ for transpose, since $'$ is confusing when derivatives are also of concern...2012-10-01
  • 0
    Gerry Myerson: Yes, in the first equation, I mean that. Dennis Gulko: I'm sorry, I'm being here for the first time, I have corrected that...2012-10-01

2 Answers 2

2

Coming back to the definitions and considering $u:X\mapsto a^TX^TXb$, one looks for a linear function $v_X:H\mapsto v_X(H)$ such that $u(X+H)=u(X)+v_X(H)+o(H)$. Now, $$ u(X+H)=u(X)+a^TH^TXb+a^TX^THb+a^TH^THb, $$ hence the gradient of $u$ at $X$ is the linear function $v_X$ defined by $$ v_X(H)=a^TH^TXb+a^TX^THb. $$ Note that this is not $H\mapsto XFH$, for any matrix $F$. However, one can express $v_X(H)$ as the trace of a matrix, as follows.

The matrices $Ha$, $Hb$, $Xa$ and $Xb$ are all column vectors and, for any column vectors $C$and $D$, $C^TD$ is simply a scalar hence $C^TD=D^TC$. In particular, $a^TH^TXb=b^TX^THa$. Using the fact that the trace of a $1\times1$ matrix is its unique coefficient and the fact that $\mathrm{tr}(CD)=\mathrm{tr}(DC)$ and $\mathrm{tr}(C+D)=\mathrm{tr}(D)+\mathrm{tr}(C)$ for every matrices $C$ and $D$ of suitable sizes, one gets $$ v_X(H)=\mathrm{tr}(H^TXba^T+b^TH^TXa)=\mathrm{tr}(H^T(Xba^T+Xab^T)). $$ If one wishes to call derivative of a function $u$ at $X$ any matrix $W_X$ such that, for every $H$, $v_X(H)=\mathrm{tr}(H^TW_X)$ (see more about this in the comments), then, in the case at hand, the derivative at $X$ is $$ W_X=X(ba^T+ab^T). $$

  • 0
    Why should it be $H\mapsto XFH$? The derivative with respect to $X$ is a matrix $A$ such that $v_X(H)=\operatorname{tr}(AH^T)$, and by the invariance of the trace under cyclic rotation and transposition, $A=X(ab^T+ba^T)$ indeed gives your $v_X(H)$.2012-10-01
  • 0
    @joriki We have a problem of definition here. To me, the differential of any real valued function $u$ at any point $x$ is the unique linear function $v_x$ such that $u(x+h)=u(x)+v_x(h)+o(h)$. In the case at hand, it happens, as you write, that $v_X(H)=\mathrm{tr}(A_XH^T)$ for some matrix $A_X$, but I fail to see why this would make $A_X$ the differential of $u$ at $X$.2012-10-01
  • 0
    But the question is about the derivative, not the differential. The derivative is the "constant of proportionality" in the differential, and my trace formula is just a convenient way of expressing that proportionality for the differential of a function of a matrix. See equation $(1725)$ on p. 659 of the PDF file linked to in the question for an equivalent definition of what they call the "gradient" with respect to $X$.2012-10-01
  • 0
    @joriki Thanks for the pointer. I am not sure I subscribe to this terminology nor that it is universal nor that it is useful. I agree that the trace formula you mention is a trick to express concisely the differential. Unfortunately, I feel that this brevity hinders the learning of these notions more than it helps (as exemplified by nearly every question on this site about multidimensional differential calculus), which is why I prefer to come back to the definitions. But hey, one may disagree with such a stance...2012-10-01
  • 0
    I see your point and I agree about the confusion apparent in the questions asked; but I think it goes too far to say that the formula is *wrong* when it's correct according to the definition being used, irrespective of the merits of that definition.2012-10-01
  • 0
    @joriki As usual you are right... Post modified on this point. (Not quite unrelated: did you read the posts on meta about so-called *moderator-wars*?)2012-10-01
0

When calculating a derivative of a matrix function, I strongly recommend to use the most basic equation: $\mathrm{d}(f(X)g(X))=\mathrm{d}f(X)g(X)+f(X)\mathrm{d}g(X)$. I don't recommend you to use any high-level equations. On one hand you need to remember them or search them before you can use, on the other hand, you must be very clear about the conditions under which these high-level equations are valid.

First, in the first equation you give, I believe the $x$ should be a vector. But in the problem that confuses you, the $X$ is a matrix. I don't think the first equation can be applied in this case.

Second, $a^TX^TXb$ is a scalar and $X$ is a matrix, the derivative of $a^TX^TXb$ with respect to $X$ should be a matrix instead of a scalar (see eq. 1725 in 'matrix calculus').

Third, I will use the most basic equation I give in the beginning to calculate the derivative of $a^TX^TXb$ with respect to $X$.

$$\mathrm{d}(a^TX^TXb) =a^T\mathrm{d}X^TXb+a^TX^T\mathrm{d}Xb =b^TX^T\mathrm{d}Xa+a^TX^T\mathrm{d}Xb\\ =\mathrm{tr}(ab^TX^T\mathrm{d}X+ba^TX^T\mathrm{d}X) =\mathrm{tr}[(ab^T+ba^T)X^T\mathrm{d}X]$$

Then $$\frac{\mathrm{d}(a^TX^TXb)}{\mathrm{d}X}=X(ba^T+ab^T)$$

  • 0
    The PDF file linked to in the question does claim on p. 661 that the formula is applicable to matrix variables.2012-10-01