3
$\begingroup$

I have found something about the product rules for matrix-functions in https://ccrma.stanford.edu/~dattorro/matrixcalc.pdf $ \frac{d(f(x)^Tg(x))}{dx}=\frac{df(x)}{dx}\cdot g(x)+\frac{dg(x)}{dx}\cdot f(x) $ I verify this in the example list in http://www.psi.toronto.edu/matrix/calculus.html. For example: $ \frac{d (Ax+b)^TC(Dx+e)}{dx} = A^TC(Dx+e) + D^TC^T(Ax+b) $ $ \frac{d (x^TCx)}{dx} = (C+C^T)x $

But when I met this one, I'm confused... $ \frac{d (a^TX^TXb)}{dX} = X(ab^T + ba^T) $ Following the formula in matrixcalc.pdf, I get this $\begin{align} f(x)&=Xa\\ g(x)&=Xb\\ \frac{d (a^TX^TXb)}{dX} &= \frac{df(x)}{dx}\cdot Xb+\frac{dg(x)}{dx}\cdot Xa \\ &=a^TXb+b^TXa\end{align}$ which is different with the correct result. I don't know what I'm doing wrong, please help me.. Thanks!

  • 0
    Gerry Myerson: Yes, in the first equation, I mean that. Dennis Gulko: I'm sorry, I'm being here for the first time, I have corrected that...2012-10-01

2 Answers 2

2

Coming back to the definitions and considering $u:X\mapsto a^TX^TXb$, one looks for a linear function $v_X:H\mapsto v_X(H)$ such that $u(X+H)=u(X)+v_X(H)+o(H)$. Now, $ u(X+H)=u(X)+a^TH^TXb+a^TX^THb+a^TH^THb, $ hence the gradient of $u$ at $X$ is the linear function $v_X$ defined by $ v_X(H)=a^TH^TXb+a^TX^THb. $ Note that this is not $H\mapsto XFH$, for any matrix $F$. However, one can express $v_X(H)$ as the trace of a matrix, as follows.

The matrices $Ha$, $Hb$, $Xa$ and $Xb$ are all column vectors and, for any column vectors $C$and $D$, $C^TD$ is simply a scalar hence $C^TD=D^TC$. In particular, $a^TH^TXb=b^TX^THa$. Using the fact that the trace of a $1\times1$ matrix is its unique coefficient and the fact that $\mathrm{tr}(CD)=\mathrm{tr}(DC)$ and $\mathrm{tr}(C+D)=\mathrm{tr}(D)+\mathrm{tr}(C)$ for every matrices $C$ and $D$ of suitable sizes, one gets $ v_X(H)=\mathrm{tr}(H^TXba^T+b^TH^TXa)=\mathrm{tr}(H^T(Xba^T+Xab^T)). $ If one wishes to call derivative of a function $u$ at $X$ any matrix $W_X$ such that, for every $H$, $v_X(H)=\mathrm{tr}(H^TW_X)$ (see more about this in the comments), then, in the case at hand, the derivative at $X$ is $ W_X=X(ba^T+ab^T). $

  • 0
    @joriki As usual you are right... Post modified o$n$ this point. (Not quit$e$ unrelated: did you read the posts on meta about so-called *moderator-wars*?)2012-10-01
0

When calculating a derivative of a matrix function, I strongly recommend to use the most basic equation: $\mathrm{d}(f(X)g(X))=\mathrm{d}f(X)g(X)+f(X)\mathrm{d}g(X)$. I don't recommend you to use any high-level equations. On one hand you need to remember them or search them before you can use, on the other hand, you must be very clear about the conditions under which these high-level equations are valid.

First, in the first equation you give, I believe the $x$ should be a vector. But in the problem that confuses you, the $X$ is a matrix. I don't think the first equation can be applied in this case.

Second, $a^TX^TXb$ is a scalar and $X$ is a matrix, the derivative of $a^TX^TXb$ with respect to $X$ should be a matrix instead of a scalar (see eq. 1725 in 'matrix calculus').

Third, I will use the most basic equation I give in the beginning to calculate the derivative of $a^TX^TXb$ with respect to $X$.

$\mathrm{d}(a^TX^TXb) =a^T\mathrm{d}X^TXb+a^TX^T\mathrm{d}Xb =b^TX^T\mathrm{d}Xa+a^TX^T\mathrm{d}Xb\\ =\mathrm{tr}(ab^TX^T\mathrm{d}X+ba^TX^T\mathrm{d}X) =\mathrm{tr}[(ab^T+ba^T)X^T\mathrm{d}X]$

Then $\frac{\mathrm{d}(a^TX^TXb)}{\mathrm{d}X}=X(ba^T+ab^T)$

  • 0
    The PDF file linked to in the question does claim on p. 661 that the formula is applicable to matrix variables.2012-10-01