3
$\begingroup$

I am trying to find the following derivative:

$ \frac{\partial}{\partial W}W^TW $

where $W \in \mathbb{R}^{n\times m}$ is a matrix. Also I am interested in finding the associated

$ \frac{\partial}{\partial W}\|W^TW\|_\mathcal{F}^2 $

I am aware of the fact that

$ \frac{\partial}{\partial X}\|X\|_\mathcal{F}^2 = \frac{\partial}{\partial X}Tr(XX^T) = 2X $

But I am not sure if the derivation in terms of $W$ is possible at all. Please advise.

Thank you very much.

  • 0
    @DidierPiau - oops, sorry, I didn't realize that. I don't even remember correcting it, so I guess someone else just did. Thanks for pointing it out.2011-11-10

3 Answers 3

4

When you write $\frac{\partial}{\partial A}B$ where $A$ and $B$ are matrices, what you are understood to mean is

$\frac{\partial}{\partial A_{ij}}B_{kl}$

which is a rank-4 tensor. It is common to contract over one or more of those indices, but it's not necessary.

Going to index notation, $(W^TW)_{kl}=W_{mk}W_{ml}$ and therefore

$ \begin{align} \left[\frac{\partial}{\partial W}(W^TW)\right]_{ijkl} & = \frac{\partial}{\partial W_{ij}}(W_{mk}W_{ml}) \\ & = \frac{\partial W_{mk}}{\partial W_{ij}} W_{ml} + W_{mk} \frac{\partial W_{ml}}{\partial W_{ij}} \\ & = \delta_{im} \delta_{jk} W_{ml} + \delta_{im}\delta_{jl}W_{mk} \\ & = \delta_{jk} W_{il} + \delta_{jl} W_{ik} \end{align} $

If you now chose to contract over a pair of indices you would get a rank 2 tensor (a matrix). For example, if you contracted over $j$ and $k$ you end up with

$ \begin{align} \delta_{jj} W_{il} + \delta_{jl} W_{ij} & = (n+1) W_{il} \end{align} $

where $n=\delta_{jj}$ is the dimension of the space your tensors are defined over.

If you need to read up about index notation you might want to take a look at this set of example questions and answers, which I found very helpful when I was learning it for the first time.

To apply this to the second part of your question you apply the multivariable chain rule as normal.

  • 0
    That's great! Thank you very much.2011-11-11
1

The best reference on this and similar problem is the book

M. Neudecker, "Matrix differential calculus", Wiley.

0

This is a pretty confused question.

$\|X\|_\mathcal{F}^2 = \text{Tr}(X X^{T}) = \sum_{kl}X_{kl}^2$

is the square of the Frobenius norm for matrices.

The Fréchet derivative w.r.t. $X$ of this can be found by going to index notation:

$\left[\frac{\partial}{\partial X}\left(\text{Tr}(X X^{T})\right)\right]_{ij} = \frac{\partial}{\partial X_{ij}}\left(\sum_{kl}X_{kl}^2\right) = \sum_{kl}2 X_{kl} \delta_{ik}\delta_{jl} = 2 X_{ij} \; .$

In the same way, one can work out

$\frac{\partial}{\partial W}W^TW \; .$

The result is however not a matrix but a rank 4 tensor.

A further reference about matrix calculus.

  • 0
    Hi kimjj, here is a [reference about matrix calculus](http://www.jstor.org/stable/2028606) that could be helpful.2011-11-10