3
$\begingroup$

I am trying to find the following derivative:

$$ \frac{\partial}{\partial W}W^TW $$

where $W \in \mathbb{R}^{n\times m}$ is a matrix. Also I am interested in finding the associated

$$ \frac{\partial}{\partial W}\|W^TW\|_\mathcal{F}^2 $$

I am aware of the fact that

$$ \frac{\partial}{\partial X}\|X\|_\mathcal{F}^2 = \frac{\partial}{\partial X}Tr(XX^T) = 2X $$

But I am not sure if the derivation in terms of $W$ is possible at all. Please advise.

Thank you very much.

  • 2
    What kind of objects are $W$ and $X$? Matrices, operators? What is $\mathcal F$? How comes $\text{Tr}(XX^T)=2X$ if $X$ is an operator or a matrix?2011-11-10
  • 0
    Sorry, yes, I forgot to say that both $W$ and $X$ are matrices and that $\mathcal{F}$ denotes the Frobenius norm.2011-11-10
  • 0
    What is $\frac{\partial}{\partial W}$?2011-11-10
  • 0
    Then $\mathrm{Tr}(XX^T)$ is not $2X$.2011-11-10
  • 0
    @AlexeiAverchenko - Maybe this notation would be better?$\frac{\partial W^TW}{\partial W}$, it should denote the partial derivative of $W^TW$ with respect to $W$2011-11-10
  • 0
    @Alexei: It's the [Fréchet derivative](http://en.wikipedia.org/wiki/Fr%C3%A9chet_derivative).2011-11-10
  • 0
    @DidierPiau - According to the Matrix cookbook $\frac{\partial Tr(X^TX)}{\partial X} = 2X$, if however we replace $X$ with $W^TW$, then the derivative is different and that's where I have trouble.2011-11-10
  • 0
    Until about 20 minutes ago, there was no differential sign in front of $\mathrm{Tr}(XX^T)$ so you were in effect equating a **number** with a **matrix**.2011-11-10
  • 0
    @DidierPiau - oops, sorry, I didn't realize that. I don't even remember correcting it, so I guess someone else just did. Thanks for pointing it out.2011-11-10

3 Answers 3

4

When you write $\frac{\partial}{\partial A}B$ where $A$ and $B$ are matrices, what you are understood to mean is

$$\frac{\partial}{\partial A_{ij}}B_{kl}$$

which is a rank-4 tensor. It is common to contract over one or more of those indices, but it's not necessary.

Going to index notation, $(W^TW)_{kl}=W_{mk}W_{ml}$ and therefore

$$ \begin{align} \left[\frac{\partial}{\partial W}(W^TW)\right]_{ijkl} & = \frac{\partial}{\partial W_{ij}}(W_{mk}W_{ml}) \\ & = \frac{\partial W_{mk}}{\partial W_{ij}} W_{ml} + W_{mk} \frac{\partial W_{ml}}{\partial W_{ij}} \\ & = \delta_{im} \delta_{jk} W_{ml} + \delta_{im}\delta_{jl}W_{mk} \\ & = \delta_{jk} W_{il} + \delta_{jl} W_{ik} \end{align} $$

If you now chose to contract over a pair of indices you would get a rank 2 tensor (a matrix). For example, if you contracted over $j$ and $k$ you end up with

$$ \begin{align} \delta_{jj} W_{il} + \delta_{jl} W_{ij} & = (n+1) W_{il} \end{align} $$

where $n=\delta_{jj}$ is the dimension of the space your tensors are defined over.

If you need to read up about index notation you might want to take a look at this set of example questions and answers, which I found very helpful when I was learning it for the first time.

To apply this to the second part of your question you apply the multivariable chain rule as normal.

  • 0
    Thank you very much for the answer. I need to read up a bit more on the index notation first to be able to follow each step. I will let you know if I don't understand some steps afterwards. Thanks.2011-11-10
  • 1
    When I was learning suffix notation (waaaaay back) I found this sheet of example questions very helpful: http://www.damtp.cam.ac.uk/user/examples/A1f.pdf2011-11-10
  • 0
    That's great! Thank you very much.2011-11-11
1

The best reference on this and similar problem is the book

M. Neudecker, "Matrix differential calculus", Wiley.

0

This is a pretty confused question.

$$\|X\|_\mathcal{F}^2 = \text{Tr}(X X^{T}) = \sum_{kl}X_{kl}^2$$

is the square of the Frobenius norm for matrices.

The Fréchet derivative w.r.t. $X$ of this can be found by going to index notation:

$$\left[\frac{\partial}{\partial X}\left(\text{Tr}(X X^{T})\right)\right]_{ij} = \frac{\partial}{\partial X_{ij}}\left(\sum_{kl}X_{kl}^2\right) = \sum_{kl}2 X_{kl} \delta_{ik}\delta_{jl} = 2 X_{ij} \; .$$

In the same way, one can work out

$$\frac{\partial}{\partial W}W^TW \; .$$

The result is however not a matrix but a rank 4 tensor.

A further reference about matrix calculus.

  • 0
    Thank you for your answer. Apologies for the confusion. I am new to this topic and struggled to find some good references that could teach me how to do such derivations. For example I can't find any good text or tutorial on how to derive using the index notation. If you have any suggestions, please let me know, thank you.2011-11-10
  • 0
    Hi kimjj, here is a [reference about matrix calculus](http://www.jstor.org/stable/2028606) that could be helpful.2011-11-10