0
$\begingroup$

I'm attempting to use stochastic gradient descent for a problem, but am stuck to solve the update rules. Basically I am confused on the below partial derivatives... any help would be appreciated!

I will only show the part of the equation that I am stuck on, as to not unnecessarily complicate my question.

$\mathcal{L} = ||\mathbf{Wu - v}||_{Fro}^{2}$

where $\mathbf{W} \in \mathbb{R}^{n \times n}$ and $\mathbf{u,v} \in \mathbb{R}^{n \times 1}$

I'm trying to solve the following:

$\frac{\partial \mathcal{L}}{\partial \mathbf{u}} = $ ?

$\frac{\partial \mathcal{L}}{\partial \mathbf{v}} = $ ?

$\frac{\partial \mathcal{L}}{\partial \mathbf{W}} = $ ?

Thank you!


Edit: The below is my progress...

Let $f(\mathbf{x}) = ||\mathbf{x}||^2_{Fro}$ and $g(\mathbf{W}) = \mathbf{Wu -v}$

then $f' = ||\mathbf{x^\top x}||^2_{Fro} = 2\mathbf{x} $ and $g' = \mathbf{u^\top}$

$\mathcal{L} = f'(g(\mathbf{W}))g'(\mathbf{W})$

$\frac{\partial \mathcal{L}}{\partial \mathbf{W}} = f'(\mathbf{Wu -v})(\mathbf{u^\top}) = 2(\mathbf{Wu -v})(\mathbf{u^\top})$

...which makes sense as this results in an $n \times n$ matrix to update $\mathbf{W}$ with.


Similarly for $f(\mathbf{x})$, but let $g(\mathbf{v})=\mathbf{Wu -v}$

then $f' = ||\mathbf{x^\top x}||^2_{Fro} = 2\mathbf{x} $ and $g' = -1$ ?

$\mathcal{L} = f'(g(\mathbf{v}))g'(\mathbf{v})$

$\frac{\partial \mathcal{L}}{\partial \mathbf{v}} = f'(\mathbf{Wu -v})(-1) = -2(\mathbf{Wu -v})$

...this results in a $n \times 1$ vector to update $\mathbf{v} $


Similarly for $f(\mathbf{x})$, but let $g(\mathbf{u})=\mathbf{Wu -v}$

then $f' = ||\mathbf{x^\top x}||^2_{Fro} = 2\mathbf{x} $ and $g' = \mathbf{W^\top}$ ?

$\mathcal{L} = f'(g(\mathbf{u}))g'(\mathbf{u})$

$\frac{\partial \mathcal{L}}{\partial \mathbf{u}} = f'(\mathbf{Wu -v})(\mathbf{W^\top}) = 2(\mathbf{Wu -v})(\mathbf{W^\top})$

...however this results in a vector matrix multiplication where the dimensions do not match (i.e., $n \times 1$ and $n \times n $)

if it were $= 2(\mathbf{W^\top})(\mathbf{Wu -v})$ then it would result in a properly formated $ n \times 1 $ vector to update u with, however I'm worried I do not understand this adjustment.


  • 0
    I believe matrix-vector calculus has the same rules. A Google Search for the Matrix Cookbook has some common rules. Also... chain rule.2017-01-13
  • 0
    @SeanRoberson , thanks for your quick reply and suggestion! I have been attempting to use the Matrix Cookbook to no avail for this term of my equation.2017-01-13
  • 0
    Look closer, see equations (132) and (136).2017-01-13
  • 0
    @LinAlg I went back and tried to work them out, but ran into another snag of either miscalculation or not interpreting a property correctly when solving. Would you mind giving it a look? Note that I'm very grateful to you both for helping me to figure this out, I much prefer the help as compared to an answer with no understanding :)2017-01-14
  • 0
    In the second case, $g'= I$. In the third case (the order does not correspond to the questions), $g' = W$, although indeed it does not match. Maybe you always need to transpose $Wu-v$.2017-01-14

0 Answers 0