3
$\begingroup$

I am told that when transforming coordinates $\nabla (f(Ax)) = A^T(\nabla f)(Ax)$, however I read both of these as "the gradient of f, where f is a function of Ax, where A is a matrix and x is a vector", with the second multiplied with $A^T$.

This is obviously incorrect, so how am I misunderstanding the notation?

Thanks, Ash

1 Answers 1

5

There can be a bit of subtlety whenever the gradient operator appears, because it is a differential operator, but the notation doesn't always make it clear what variable the differentiation is with respect to.

On the left, you are differentiating $f(Ax)$ with respect to $x$. This is not the gradient of $f$ per se, but of the transformed function $x \mapsto f(Ax)$. On the right, $\nabla f$ is the gradient of $f$. So $(\nabla f)(Ax)$ is the gradient of $f$ evaluated at $Ax$; in other words, you are differentiating $f$ with respect to its argument (which here is $Ax$), not with respect to $x$.

It might help to think of the analogous one-dimensional equation: \frac{d}{dx} f(ax) = a f'(ax).