0
$\begingroup$

Hi I am trying to take a derivative of a matrix trace and am having some trouble. The function is given by

$$ f(X) = |Tr(X X^\top)|^2 $$ where $X$ is a complex matrix and $X^\top$ is the real transpose. I want to take the derivative wrt $X^H$ where $X^H$ is the hermitian transpose of $X$. Thus I have $$ \frac{\partial}{\partial X^H} |Tr(X X^\top)|^2=0? $$ Is this zero since $f(X)$ does not depend on $X^H$? If not zero, how can we differentiate this? I thought naively at first its zero, however I can write $f(X)$ as $$ \frac{\partial}{\partial X^H} \left( Tr(X X^\top) \overline{Tr(X X^\top)}\right)=? $$ where the bar denotes complex conjugation. (Note, this is the same as for a complex number $z \overline{z}=|z|^2$.) When I write it like this, I think that the complex conjugation may act on the trace and make one of the $X$ become $X^H$ which would then result in a non-zero derivative. Is this wrong?

Note: I'm not sure if it will help but a similar derivative is given by $$ \frac{\partial}{\partial X^H} \left(Tr(X X^H)\right)^2=2 Tr( X X^H) \frac{\partial}{\partial X^H} Tr( X X^H) = 2 Tr( X X^H) X $$

Thanks!

  • 0
    Taking the derivative with respect to $X^*$ is meaningless. The derivative of $f$ with respect to $X$ makes sense.2017-01-30
  • 0
    @copper.hat thanks, and why is that in terms of mathematical terms?2017-01-30
  • 0
    Why is what? The derivative of a function is defined in terms of the function's parameters.2017-01-30
  • 0
    @copper.hat so is the derivative zero? This is from a math class so I'm not entirely sure it's 'meaningless ' however it may be zero indeed.2017-01-30
  • 0
    What derivative? The derivative of $f$ is not zero.2017-01-30
  • 0
    @copper.hat the derivative I have in the question which is clearly stated wrt $X^H$2017-01-30
  • 0
    See my first comment.2017-01-30
  • 0
    @copper.hat your first comment and all of them are essentially meaningless to me.2017-01-30
  • 0
    The derivative wrt X^* of a function depending on only $X,X^\top$ is zero, just how differentiating a function $f(x)$ wrt y is zero.2017-01-30
  • 0
    Please point me to a **single** definition of the Fréchet derivative with respect to anything other than the parameter in question ($X$ here).2017-01-30
  • 0
    One simple example is here (see the answer) http://math.stackexchange.com/questions/2060542/matrix-trace-derivatives2017-01-30
  • 0
    I see no definition or single example there. The idea of the derivative is to find a linear approximation of the function in a prescribed sense at a point. It makes no sense to have a function $X \mapsto f(X)$ and talk about the linear approximation in terms of $X^*$.2017-01-30

1 Answers 1

2

Write the function in terms of the Frobenius product, then find its differential and its gradients $$\eqalign{ f &= (X:X)^*\,(X:X) \cr &= (X^H:X^H)\,(X:X) \cr\cr df &= 2(X^H:X^H)X:dX + 2(X:X)X^H:dX^H \cr\cr \frac{\partial f}{\partial X^H} &= 2(X:X)X^H, \,\,\,\,\,\,\,\,\,\,\,\, \frac{\partial f}{\partial X} = 2(X^H:X^H)X \cr\cr }$$

I used the fact that $(X^*:X^* = X^H:X^H)$ on line 2, since transposing both operands in a Frobenius product leaves it unchanged.

  • 0
    Thanks so much this is very helpful. I understand what you did in terms of the second line. I just want to make sure I'm clear on something since I initially thought the derivative was zero; but this derivative was non zero because of the complex conjugate term. However if the function was squared only and not complex square, the derivative would be zero right? What I mean is: $$ \frac{\partial}{\partial x^H} (Tr(X X^\top))^2=0 $$ Since the function is purely now only depending on $X,X^\top$. I really appreciate your help on multiple occasions. Thank you.2017-01-31
  • 1
    @Integrals That's right, the gradient wrt $X^H$ is zero if there's no dependence on $X^*$ or $X^H$.2017-01-31
  • 0
    thanks again for your time and help. I'm learning a lot from how you do these kinds of problems. Often (or always) it seems you strictly work in terms of differentials opposed to the chain rule. I'm not used to working in terms of differentials but will start to get more in the habit. I can see how powerful and useful it is when calculating gradients and such. Thanks again.2017-01-31
  • 0
    Is the gradient wrt $X^H$ always equal to the gradient wrt $X^*$ ? where * is complex conjugation. Thanks2017-02-04
  • 0
    @Integrals If the differential wrt $X^H$ is $$df=G:dX^H$$ then you are free to transpose both factors in the Frobenius product to obtain $$df=G^T:dX^*$$ So it appears that the gradients in question are transposes of each other. Only in the case that $G$ is symmetric are they equal.2017-02-04
  • 0
    that's really helpful thanks a lot.2017-02-04