1
$\begingroup$

I am taking derivative of

$$X^TAX, X \in {\rm I\!R}^n$$

using Frechet Derivative where $$f(x + h) = f(x) + <\nabla f(x), h> + O||h|| $$.

So I have

$$f(x + h) = X^TAX + X^TAh + h^TAX +h^TAh$$

and with the two terms in between, I have

$$ $$

and I think this $X^T(A+A^T)$ is the derivative of $X^TAX$. However, since $X$ is a $n \times 1$ vextor, while $X^T(A+A^T)$ is a $1 \times n$ vector. Am I doing anything wrong here? I saw some matrix calculus instructions also have this answer. I don't know what is happening.

So the problem is if I am doing gradient decent, I will have to do $x - \nabla f(x)$, but the dimensions don't match, so I think there must be something wrong.

  • 0
    The way you use the notation $<\cdot>$ is kind of unusual and inconsistent.2017-02-13
  • 0
    $X^TAX$ is a $1\times 1$ vector. $\nabla X^TAX$ is a $1\times n$ vector.2017-02-13
  • 0
    @user251257 I'm sorry. It's inner product.2017-02-13
  • 0
    @Doug M Yes. My problem is that is X is $n \times 1$ vector, while the gradient is $1 \times n$, isn't this wired? If I am doing gradient decent, I have to do $ x - \nabla f(x) $. If the dimensions don't match, how can I do the gradient decent?2017-02-13
  • 0
    @user3716774 in such context, the gradient is usually the **transpose** of the Jacobian. So just transpose you result.2017-02-13
  • 0
    @user3716774 Thank you very much! So both $X^T(A^T + A)$ and its transpose are the gradient of the function? Or is there some rules in this context? Do you have some reference that I can look into? Thank you again.2017-02-13

1 Answers 1

1

Let $f : \mathbb R^n \to \mathbb R$ be defined by $f (\mathrm x) := \mathrm x^{\top} \mathrm A \, \mathrm x$, where $\mathrm A \in \mathbb R^{n \times n}$ is given. Hence,

$$f (\mathrm x + h \mathrm v) = (\mathrm x + h \mathrm v)^{\top} \mathrm A \, (\mathrm x + h \mathrm v) = f (\mathrm x) + h \langle \mathrm v, \mathrm A \, \mathrm x \rangle + h \langle \mathrm A^{\top} \mathrm x, \mathrm v \rangle + h^2 \mathrm v^{\top} \mathrm A \, \mathrm v$$

The directional derivative of $f$ in the direction of $\mathrm v$ at $\mathrm x$ is, thus,

$$D_{\mathrm v} f (\mathrm x) = \langle \mathrm v, \mathrm A \, \mathrm x \rangle + \langle \mathrm A^{\top} \mathrm x, \mathrm v \rangle = \langle \mathrm v, (\mathrm A + \mathrm A^{\top}) \, \mathrm x \rangle$$

and the gradient of $f$ is

$$\boxed{\quad \nabla f (\mathrm x) = (\mathrm A + \mathrm A^{\top}) \, \mathrm x = 2\left(\frac{\mathrm A + \mathrm A^{\top}}{2}\right) \, \mathrm x \quad}$$

where $\dfrac{\mathrm A + \mathrm A^{\top}}{2}$ is the symmetric part of $\mathrm A$.

  • 1
    Oh! So the derivative I took was wrong! Thank you very much!2017-02-15
  • 0
    how did you derive the directional derivative of $f$ just from $f(x+hv)$?2017-12-04
  • 0
    @CharlieParker I took the first-order terms and divided by $h$.2017-12-04
  • 0
    one last thing I forgot, how did you convert to the gradient from the direction derivative? plugging in $v = [1,1,1,1,1,...,1]$ can't be the right answer, right?2017-12-04
  • 0
    @CharlieParker I am just using the definitions of directional derivative and gradient. The inner product of $\rm v$ and the gradient of $f$ gives us the directional derivative of $f$ in the direction of $\rm v$.2017-12-04