2
$\begingroup$

Let $A$ be an $m\times n$ real matrix, $x$ an $n\times 1$ vector and $b$ an $m\times 1$ vector. I want to compute \begin{equation} \dfrac{\partial }{\partial x} \Vert Ax+b\Vert^{2}. \end{equation} First, I expanded \begin{equation} \Vert Ax+b\Vert^{2}=(Ax+b)^{T}(Ax+b)=x^{T}A^{T}Ax+2x^{T}A^{T}b+b^{T}b \end{equation} then I computed \begin{eqnarray} \dfrac{\partial }{\partial x}(x^{T}A^{T}Ax+2x^{T}A^{T}b+b^{T}b)=A^{T}Ax+x^{T}A^{T}A+2A^{T}b \end{eqnarray} but I know the above is wrong since $A^{T}Ax$ and $x^{T}A^{T}A$ does not have the same dimention. Thanks for the help.

1 Answers 1

2

Rather than expanding first, do the opposite. Define a new vector $$y=Ax+b$$ and write the function in terms of this new variable and the Frobenius product (which I'll denote by a colon). This approach reduces the visual "clutter". You can then expand the results after finding the derivative.

With the Frobenius product, finding the gradient is easy and fool-proof $$\eqalign{ f &= \|y\|^2 = y:y \cr \cr df &= 2\,y:dy \cr &= 2\,y:A\,dx \cr &= 2\,A^Ty:dx\cr \cr \frac{\partial f}{\partial x} &= 2\,A^Ty \cr &= 2\,A^T(Ax+b) \cr \cr }$$ The rules for rearranging the Frobenius product $$\eqalign{ A:B &= B:A \cr A:BC &= B^TA:C = AC^T:B\cr }$$ can be derived from the familiar properties of the trace, since $$A:B={\rm tr}(A^TB)$$