10
$\begingroup$

On Wikipedia, this is the following description of gradient descent:

Gradient descent is based on the observation that if the multivariable function $F(\mathbf{x})$ is defined and differentiable in a neighborhood of a point $\mathbf{a}$, then $F(\mathbf{x})$ decreases fastest if one goes from $\mathbf{a}$ in the direction of the negative gradient of $F$ at $\mathbf{a}$.

Now I have several doubts in this description. First of all I have an example of $f(x)=x^2$ in my mind and my starting point is, say, $x=5$.

  1. What is the meaning of "decreases fastest"? I mean, I can go straight from $x=5$ to $x=0$ (which is minimum point), then what's the point of fastest decrease? What is the notion of fast here?
  2. Where did this observation come from? I didn't see the proof of this observation.
  • 2
    You might want to see [this](http://books.google.com/books?hl=en&id=uOJ-Vg1BnKgC&pg=PA402).2012-08-09

3 Answers 3

9

In two dimensions, like in the example of $f(x)=x^2$,at the point $(5,25)$ on the curve, there are only two ways to "move" and still be on the curve: to the left and to the right, and in our case, the value of the function decreases as we move to the left, and increases as we go to the right. No problems.

But, when we are on, for example the following function $z=f(x,y)$ which is a two variable function, which looks like the inverted dome below.

<span class=f(x,y)">

Imagine that you are on top of the dome. Then, There are an infinite number of curves(see figure) you can move on and still stay on the surface. And, since our location is so symmetric, it does not matter which direction we move in- it will always be the same rate of change in all directions.

But, suppose we were not on such a symmetric location, say a little lower in any direction. Then, with a little more thought we can see that there is exactly one direction in which you must move in order to achieve maximum descent. That is downward on the curve connecting the point you are on to the top of the dome.


As for rigorous proof of this phenomenon,J.M's link in the comments is an excellent resource.

  • 1
    Suggestio$n$: Assumi$n$g that the co$n$text of the OP is mi$n$imizatio$n$, it might add clarity if the illustratio$n$ was inverted so that it had a definite minimum.2012-08-25
3

The concept of a direction of fastest descent only makes sense in more than one dimension. In your one-dimensional example, you can only go left or right, and the function increases in one direction and decreases in the other; there's nothing else to compare.

In several dimensions, however, you have a continuum of directions to choose from, and the directional derivative is maximal in the direction of the gradient.

2

In your example, the gradient at $x= 5$ is $10$, so the negative gradient is $-10$. Since the negative gradient is negative (not neccesarily the case, thinkk about $x= -5$), it decreases fastest in the negative direction.

The gradient can be interpreted as the slope of a tangent. In the one-variable case, the tangent is a line and the slope can be represented as a single number. In the multivariable case, let's say with a function like $f(x, y, z) = x^2 + zy^3 - \frac{x}{y^2}$, the gradient will be a vector with corresponding to the direction of fastest ascent. Which means the fastest descent will be the opposite direction: the direction of the negative gradient.