On Wikipedia, this is the following description of gradient descent:
Gradient descent is based on the observation that if the multivariable function $F(\mathbf{x})$ is defined and differentiable in a neighborhood of a point $\mathbf{a}$, then $F(\mathbf{x})$ decreases fastest if one goes from $\mathbf{a}$ in the direction of the negative gradient of $F$ at $\mathbf{a}$.
Now I have several doubts in this description. First of all I have an example of $f(x)=x^2$ in my mind and my starting point is, say, $x=5$.
- What is the meaning of "decreases fastest"? I mean, I can go straight from $x=5$ to $x=0$ (which is minimum point), then what's the point of fastest decrease? What is the notion of fast here?
- Where did this observation come from? I didn't see the proof of this observation.