0
$\begingroup$

Say $E = f(X) \\ \text{when}\ X \to X+\delta \\\text{where}\ \|\delta\| \to 0\ \text{is a vector}$ then $\Delta E \approx f(X) +\delta^T \nabla_X f(X) $

Is this correct ? Then my question is , by the definition of gradient, the gradient should be the direction which increases your function value the most. But now I am not moving my X along with that direction. Howcome the equation above is correct ?

Recall from a 1-D derivative, the gradient tells you what increase you will get if you "move in this direction that I tell you"

But now I am not moving along with the gradient direction. Why is this still the approximation for $\Delta E$

My thought is that the equation should be $\Delta E = f(X) +\delta^T \nabla_{\delta}f(X)$

Am I missing something ?

  • 0
    Is $f: \Bbb R^n \to \Bbb R$ (or $\Bbb C^n \to \Bbb C$)? In that case, probably $\Delta E$ should be $\delta^T \nabla_X f(X)$.2012-10-15

1 Answers 1

0

Recall the gradient is defined (for $f: \Bbb R^n \to \Bbb R$) as:

$\nabla_X f = \nabla f(X) := \begin{pmatrix}\partial_1 f(X) \\ \vdots \\ \partial_n f(X)\end{pmatrix}$

where $\partial_i f := \dfrac{\partial f}{\partial x_i}$ WRT the standard basis on $\Bbb R^n$. It is the transpose of the Jacobian matrix for real-valued functions $f$.

With this in mind, $\delta^T \nabla f(X) = \left<\delta, \nabla f(X)\right>$ is what we obtain by "linearly approximating $f$ in the direction of $\delta$ by the length of $\delta$".

This appears to be a correct first-order approximation of the change if $\|\delta\| \to 0$.

Your confusion seems to arise from the expression $\nabla_X f(X)$, which is not correct.

  • 0
    Please do note that $\delta^T \nabla f(X) = Df(X)(\delta)$ in the notation of multivariable calculus; alternatively, it's $\|\delta\| \cdot D_{\hat \delta} f(X)$ where $\hat \delta = \dfrac{\delta}{\|\delta\|}$ is the direction of $\delta$. The gradient does *not only* determine the direction in which $f(X)$ increases the most, it also allows to compute the rate of change in any other direction (that these two coincide is a virtue of the fact that the inner product of two vectors is largest if they are collinear). Please read [this WikiPedia](http://en.wikipedia.org/wiki/Gradient) thoroughly.2012-10-16