11
$\begingroup$

In linear regression, the loss function is expressed as

$$\frac1N \left\|XW-Y\right\|_{\text{F}}^2$$

where $X, W, Y$ are matrices. Taking derivative w.r.t $W$ yields

$$\frac 2N \, X^T(XW-Y)$$

Why is this so?

  • 0
    Don't be self-deprecating, your question is not dumb :) +12017-02-04

2 Answers 2

12

Let

$$\begin{array}{rl} f (\mathrm W) &:= \| \mathrm X \mathrm W - \mathrm Y \|_{\text{F}}^2 = \mbox{tr} \left( (\mathrm X \mathrm W - \mathrm Y)^{\top} (\mathrm X \mathrm W - \mathrm Y) \right)\\ &\,= \mbox{tr} \left( \mathrm W^{\top} \mathrm X^{\top} \mathrm X \mathrm W - \mathrm Y^{\top} \mathrm X \mathrm W - \mathrm W^{\top} \mathrm X^{\top} \mathrm Y + \mathrm Y^{\top} \mathrm Y \right)\end{array}$$

Differentiating with respect to $\mathrm W$,

$$\nabla_{\mathrm W} f (\mathrm W) = 2 \, \mathrm X^{\top} \mathrm X \mathrm W - 2 \, \mathrm X^{\top} \mathrm Y = 2 \, \mathrm X^{\top} \left( \mathrm X \mathrm W - \mathrm Y \right)$$

  • 0
    What about the derivative with respect to $ X $?2017-09-08
  • 0
    @Royi Aren't $\rm X$ and $\rm Y$ given?2017-09-08
  • 1
    I'm just saying what's the derivative of $ \left\| X W - Y \right\|_{F}^{2} $ with respect to $ X $. Just curious.2017-09-08
  • 1
    I get it is $ 2 \left( X W - Y \right) {W}^{T} $.2017-09-08
  • 1
    @Royi I just got the exact same.2017-09-08
  • 0
    how did you get 2XTY ? can you show the full step to do the differentiation2017-12-29
  • 1
    @kong If you want a step-by-step derivation, use the directional derivative.2017-12-29
  • 0
    i did not study that in my course. how else can i compute the differentiation ?2017-12-29
  • 1
    @kong You can always use the matrix cookbook.2017-12-29
  • 0
    wow thank you that is very helpful :) now I just need to understand how the formulas are derived2017-12-29
  • 1
    @kong The derivatives of the linear terms are easy. Just use the properties of the trace and the definition of the Frobenius inner product. The derivative of the quadratic term is not so easy, but one can use the definition of the directional derivative.2017-12-29
  • 0
    i cant find the answer in the book to the derivative of tr$(W^TX^TXW)$2017-12-29
  • 1
    @kong Section 2.5.2, equation 108.2017-12-29
  • 0
    I get $X^TXW + XX^TW$ How do I get $2X^TXW$ if X is not square ?2017-12-29
  • 1
    @kong Your result is likely incorrect.2017-12-29
  • 0
    but eq. 108 says $ \frac{d}{dX} X^TBX = BX + B^TX $. So $ \frac{d}{dW} W^TX^TXW = X^TXW + XX^TW $ right ?2017-12-29
  • 1
    @kong No, because $(X^T X)^T = X^T X$. When you transpose a product of matrices, the order is reversed.2017-12-29
  • 0
    oohh i cant believe i missed that. thank you very much kind sir !!!2017-12-29
5

Let $X=(x_{ij})_{ij}$ and similarly for the other matrices. We are trying to differentiate $$ \|XW-Y\|^2=\sum_{i,j}(x_{ik}w_{kj}-y_{ij})^2\qquad (\star) $$ with respect to $W$. The result will be a matrix whose $(i,j)$ entry is the derivative of $(\star)$ with respect to the variable $w_{ij}$.

So think of $(i,j)$ as being fixed now. Only some of the terms in $(\star)$ depend on $w_{ij}$. Taking their derivative gives $$ \frac{d\|XW-Y\|^2}{dw_{ij}}=\sum_{k}2x_{ki}(x_{ki}w_{ij}-y_{kj})=\left[2X^T(XW-Y)\right]_{i,j}. $$

  • 0
    I found your answer very helpful! Thanks so much :).2018-09-20