6
$\begingroup$

I think everyone should know the directional derivatives $D_vf=\nabla f\cdot v$ but actually why this is true? As I know,the derivatives is $\displaystyle\lim_{h\rightarrow 0} \frac{f(x+h)-f(x)}{h}$ but this is a scalar not a vector. So why is that true?

  • 1
    That's the definition of derivative for a (possibly vector-valued) function of a scalar variable. What you want is the definition of derivative for a scalar-valued function of a vectorial variable.2012-01-08

4 Answers 4

0

The approach I consider most satisfying is as follows, if you know the one variable theory (which I assume). If $f:\mathbb{R}^n \rightarrow \mathbb{R}$ and $x\in \mathbb{R}^n$ is a point and $v\in \mathbb{R}^n $ a vector, then by definition, $ Df(x)(v)= \lim_{t\rightarrow 0}\frac{f(x+tv)-f(x)}{t}$ is the directional derivative of $f$ in direction $v$, if the limit exists. This is the derivative of a function $\mathbb{R}\rightarrow \mathbb{R}$, so it's covered by the one dimensional theory. Now if $f$ is suffiently nice and $x$ is fixed, one can check that $ v\mapsto Df(x)(v)$ is a linear map $\mathbb{R}^n \rightarrow \mathbb{R}$. From linear Algebra it is known that this implies there is a unique vector $w= w(f,x)$ such that $Df(x)(v) = \langle w,v\rangle$ with $\langle.,.\rangle$ denoting the scalar product in $\mathbb{R}^n$. Then, by definition, the gradient of $f$ in $x$ is the vector $\nabla f(x) := w$ This approach has the advantage of carrying over to more general situations (Riemannian, Hilbert manifolds) and being conceptionally clean. In particular it does not depend on any choice of coordinate system or other choices, e.g. a base of Euclidean space or it's dual. It has the disadvantage that the gradient is defined by an existence theorem which is not that explicit, so all the usual properties have to be derived using this existence theorem (which is sometimes rather abstract).

  • 0
    Yes, I should have replaced $L$ with its restriction to $(\ker L)^\perp$. I agree that your description is the best way to describe the gradient. Altough I think you only need a pseudo-inner product (i.e. a nondegenerate symmetric bilinear form) rather than an inner product.2012-01-09
4

You are probably asking yourself this question because in calculus courses we are not focused in proving but rather in computing. The fact that $D_f = \nabla f \cdot v$ is not really the right way to look at it (i.e. as a scalar product) to understand why this is the logical definition. The better way to think of it is to see $\nabla f$ as a linear transformation, and see $\nabla f \cdot v$ as the evaluation of this linear transform at $v$. I'll try to make myself clearer.

In one dimension, when one computes the derivative, one obtains a real number. This real number is in one-to-one correspondence with a linear transformation, $D_f(x)$, which associates to every number $v$ the new number $D_f(x) \cdot v$. In other words, this could be considered as a directional one-dimensional derivative (taking numbers different than $1$, as in the $2$-dimensional case, isn't very pertinent ; we can restrict ourselves to vectors of norm $1$, hence $\pm 1$ are the only interesting cases). So it makes sense that in the direction $-1$ we obtain $-D_f$, since in the opposite direction of the slope, the variation is minus the variation we would have in the positive direction of the slope.

Over $\mathbb R^n$, a function is defined to be differentiable at a point $x$ when there exists a linear transformation $L(x)$ (or a $n \times 1$ matrix, if you are not familiar with such concepts) such that $ \lim_{v \to 0} \frac{ f(x + v) - f(x) - L(x)v }{\| v \|} = 0. $ In this case $L(x)$ is said to be the derivative of $f$ at $x$.

Using Taylor's theorem, one can actually deduce that $L(x) = \nabla f(x)$ when $f$ is differentiable.

Note that you can also feel why this definition should hold, and see why $D_f = \nabla f \cdot v$ in this manner ; if you define $g(h) = f(x+hv)$, you realize that $g'(0) = \nabla f \cdot v$ by the chain rule. It's another way to think about it. See? You have, by writing $v = (\|v\| \cos \theta, \|v\| \sin \theta)$ : \begin{align} g'(0) & = \frac{\partial f}{\partial x} \frac{\partial (x+hv)_x}{\partial h} + \frac{\partial f}{\partial y} \frac{\partial(x+hv)_y}{\partial h} \\\ & = \frac{\partial f}{\partial x} \|v\| \cos \theta + \frac{\partial f}{\partial y} \|v\| \sin \theta = \nabla f \cdot (\|v\| \cos \theta, \|v\| \sin \theta) = \nabla f \cdot v. \end{align}

Hope that helps,

  • 0
    I've did my small proof with $f : \mathbb R^2 \to \mathbb R$, but the same proof goes for $f : \mathbb R^n \to \mathbb R^m$, you just have more notation to write. I wanted to keep it simple and give the feeling.2012-01-08
3

Let me just focus of function of two variables: i.e. $f:\mathbb{R}^2\rightarrow\mathbb{R}$. (You can generalize easily to $n$ variables by replacing $2$ by $n$ in the following) Then gradient of $f$ at $(x_0,y_0)$, $\nabla f(x_0,y_0)$, is a $2$-dimensional vector given by $\nabla f(x_0,y_0)=(\frac{\partial f}{\partial x}(x_0,y_0),\frac{\partial f}{\partial y}(x_0,y_0)).$ Given any unit vector $v=(v_1,v_2)$, the directional derivative $D_vf$ of $f$ at the point $(x_0,y_0)$ in the direction $v$ is defined as $D_vf(x_0,y_0)=\lim_{t\rightarrow 0}\frac{f(x_0+tv_1,y_0+tv_2)-f(x_0,y_0)}{t}.$

Therefore, the directional derivative $D_{(1,0)}f$ is nothing but the partial derivative of $f$ with respect to $x$, i.e. $D_{(1,0)}f(x_0,y_0)=\frac{\partial f}{\partial x}(x_0,y_0)$. Similarly, the directional derivative $D_{(0,1)}f$ is nothing but the partial derivative of $f$ with respect to $y$, i.e. $D_{(0,1)}f(x_0,y_0)=\frac{\partial f}{\partial y}(x_0,y_0)$.

Then the formula you have given follows from chain rule: $D_vf(x_0,y_0)=\lim_{t\rightarrow 0}\frac{f(x_0+tv_1,y_0+tv_2)-f(x_0,y_0)}{t}$ $=\frac{d}{dt}(f(x_0+tv_1,y_0+tv_2))\big|_{t=0}=\frac{\partial f}{\partial x}(x_0,y_0)v_1+\frac{\partial f}{\partial y}(x_0,y_0)v_2=\nabla f(x_0,y_0)\cdot v.$

Hope that this helps.

  • 0
    well,i think i get it,is there any geometrical meaning to gradient?2012-01-08
2

So, I think you are confused about the concept of a total differential, which then trickles down to a misunderstanding of the gradient and directional derivatives. So, for a function, say $f(x,y,z)$ we define $df$ (not $\Delta f$) as follows, $df = \frac{\partial f}{\partial x} dx + \frac{\partial f}{\partial y} dy + \frac{\partial f}{\partial z} dz$

We can understand the motivation of this definition and it makes sense by understanding some of its properties. So, I will list three I can think of,

  • Encode how changes in $x,y,z$ affect $f$
  • Placeholder for small variations $\Delta x, \Delta y, \Delta z$ to get approximation formula $\Delta f \approx f_{x} \Delta x + f_{y} \Delta y + f_{z} \Delta z$ (where $f_{i}$ is the partial derivative of $f$ with respect to $i$)
  • Divide by something like $dt$ to get a rate of change. When $x=x(t), y=y(t), z=z(t)$, then $\frac{df}{dt} =\frac{\partial f}{\partial x} \cdot \frac{dx}{dt} + \frac{\partial f}{\partial y} \cdot \frac{dy}{dt} + \frac{\partial f}{\partial z} \cdot \frac{dz}{dt}$ by the chain rule.

So, in general you can think of the total differential $df$ of a function as the thing that encodes how $f$ changes and has the capacity to change.

The gradient vector is defined as the following vector, $\nabla w = \left(\frac{\partial w}{\partial x}, \frac{\partial w}{\partial y}, \frac{\partial w}{\partial z}\right)$ So, as I posted in my previous response to your question here, we can derive the properties of how the gradient vector is perpendicular to the level surface and such. It seems that you are a little bit stuck in the single-variable mode of thinking. The definition of the derivative as the limit you suggested is talking about a different concept than the gradient vector. The gradient vector can be thought of as a "scalar field" of a function and as such it represents more than the limit as an x-value gets closer to the value of the function.

As for now understanding the concept of a directional derivative $\frac{dw}{ds} \mid_{\hat{u}}$ I consider it geometrically as the slope of a slice of the graph by a vertical plane parallel to $\hat{u}$. Refer to the other responses for additional explanation of some of the math behind directional derivatives.