2
$\begingroup$

$$f(W)=(Ax-b)^TW(Ax-b)=x^TA^TWAx-2b^TWAx+b^TWb$$

where $f(W)$ is a function of $W$, $A$ is a known matrix, $x$ and $b$ are vectors ($b$ is known). How to get $\frac{\partial f}{\partial W}$?

  • 0
    x and b are fixed, right? If $Ax=b$, then $f(W)=0$ for all $W$. If $Ax \neq b$, then $f(W) is unbounded above and below. Is there more to this problem that you haven't told us?2017-02-13
  • 0
    @BrianBorchers Sorry I made a mistake understanding the problem. Now I just keep it correct and clean.2017-02-13
  • 0
    Try $W=\alpha I$ and $W=-\alpha I$ where $\alpha$ is a very large number (say $1 \times 10^{300}$.)2017-02-13
  • 0
    @BrianBorchers thanks!2017-02-13

2 Answers 2

2

Define the vector $$y=Ax-b$$ and write the function in terms of this new variable and the double-dot (aka Frobenius) product.

In this form, the differential & gradient are easy to calculate $$\eqalign{ f &= yy^T:W \cr df &= yy^T:dW \cr \frac{\partial f}{\partial W} &= yy^T \cr }$$

  • 0
    Should that be $y^Ty$?2017-02-12
  • 2
    @James No, $y^Ty$ is a scalar, and the gradient wrt a matrix is a matrix. The same way that the gradient of a scalar function wrt to a vector argument, is a vector and not a scalar.2017-02-12
1

$$f (\mathrm W) := (\mathrm A \mathrm x - \mathrm b)^{\top} \mathrm W (\mathrm A \mathrm x - \mathrm b) = \mbox{tr} \left( (\mathrm A \mathrm x - \mathrm b)(\mathrm A \mathrm x - \mathrm b)^{\top} \mathrm W \right) = \langle (\mathrm A \mathrm x - \mathrm b)(\mathrm A \mathrm x - \mathrm b)^{\top}, \mathrm W \rangle$$

Hence,

$$f ' (\mathrm W) = \color{blue}{(\mathrm A \mathrm x - \mathrm b)(\mathrm A \mathrm x - \mathrm b)^{\top}}$$