2
$\begingroup$

I'm familiar with calculating derivatives of vectors, and with some searching here on StackExchange, I see how to calculate derivatives of a scalar and a matrix or a vector and a matrix, but I'm a bit puzzled on how to calculate the derivative of a matrix and its transpose. I know the solution is $X^T(y-XW)$, but I'm not at all sure how that was obtained.

If I let $U = (y-Xw)^T$, $V = (y-Xw)$ where y and w are vectors, X is a matrix, and T is the transpose, then apply the familiar product rule from calculus, I get $-X^T(y-Xw) + (y-Xw)^T -X$. In the solution above, it looks like U was differentiated and V was left alone. What am I missing?

2 Answers 2

2

Define a new vector variable $$v=Xw-y$$ Then use the Frobenius inner product (which I'll denote with a colon) to write the function and find its differential and gradient $$\eqalign{ f &= v:v \cr\cr df &= 2v:dv \cr &= 2v:X\,dw \cr &= 2X^Tv:dw \cr\cr \frac{\partial f}{\partial w} &= 2X^Tv \cr &= 2X^T(Xw-y) \cr \cr }$$

0

Looks like this is exactly the same derivation as for the derivation of the ordinary least squares equation, but in matrix form. Page 8 of this document spells it all out:

http://isites.harvard.edu/fs/docs/icb.topic515975.files/OLSDerivation.pdf

  • 1
    The iSites platform has been retired((2018-06-28