I'm attempting to use stochastic gradient descent for a problem, but am stuck to solve the update rules. Basically I am confused on the below partial derivatives... any help would be appreciated!
I will only show the part of the equation that I am stuck on, as to not unnecessarily complicate my question.
$\mathcal{L} = ||\mathbf{Wu - v}||_{Fro}^{2}$
where $\mathbf{W} \in \mathbb{R}^{n \times n}$ and $\mathbf{u,v} \in \mathbb{R}^{n \times 1}$
I'm trying to solve the following:
$\frac{\partial \mathcal{L}}{\partial \mathbf{u}} = $ ?
$\frac{\partial \mathcal{L}}{\partial \mathbf{v}} = $ ?
$\frac{\partial \mathcal{L}}{\partial \mathbf{W}} = $ ?
Thank you!
Edit: The below is my progress...
Let $f(\mathbf{x}) = ||\mathbf{x}||^2_{Fro}$ and $g(\mathbf{W}) = \mathbf{Wu -v}$
then $f' = ||\mathbf{x^\top x}||^2_{Fro} = 2\mathbf{x} $ and $g' = \mathbf{u^\top}$
$\mathcal{L} = f'(g(\mathbf{W}))g'(\mathbf{W})$
$\frac{\partial \mathcal{L}}{\partial \mathbf{W}} = f'(\mathbf{Wu -v})(\mathbf{u^\top}) = 2(\mathbf{Wu -v})(\mathbf{u^\top})$
...which makes sense as this results in an $n \times n$ matrix to update $\mathbf{W}$ with.
Similarly for $f(\mathbf{x})$, but let $g(\mathbf{v})=\mathbf{Wu -v}$
then $f' = ||\mathbf{x^\top x}||^2_{Fro} = 2\mathbf{x} $ and $g' = -1$ ?
$\mathcal{L} = f'(g(\mathbf{v}))g'(\mathbf{v})$
$\frac{\partial \mathcal{L}}{\partial \mathbf{v}} = f'(\mathbf{Wu -v})(-1) = -2(\mathbf{Wu -v})$
...this results in a $n \times 1$ vector to update $\mathbf{v} $
Similarly for $f(\mathbf{x})$, but let $g(\mathbf{u})=\mathbf{Wu -v}$
then $f' = ||\mathbf{x^\top x}||^2_{Fro} = 2\mathbf{x} $ and $g' = \mathbf{W^\top}$ ?
$\mathcal{L} = f'(g(\mathbf{u}))g'(\mathbf{u})$
$\frac{\partial \mathcal{L}}{\partial \mathbf{u}} = f'(\mathbf{Wu -v})(\mathbf{W^\top}) = 2(\mathbf{Wu -v})(\mathbf{W^\top})$
...however this results in a vector matrix multiplication where the dimensions do not match (i.e., $n \times 1$ and $n \times n $)
if it were $= 2(\mathbf{W^\top})(\mathbf{Wu -v})$ then it would result in a properly formated $ n \times 1 $ vector to update u with, however I'm worried I do not understand this adjustment.