3
$\begingroup$

In the gradient descent algorithm say $f(x)$ (quadratic function) is the objective function. SO the algorithm is defined as

$x_i = x_i - a\frac{\partial f(x)}{\partial x_i}$

I Just dont quite understand the meaning of doing a subtraction. I'm intuitively able to follow that we are going in the direction of steepest descent but have some questions. The derivative of $f(x)$ is going to give us the equation of a line. So when we substitute the value of $x_i$ in $f'(x)$ , what we get is a $y$ coordinate: $y_i$. So I dont understand how we subtract a $y$ coordinate from an $x$ coordinate ?

3 Answers 3

5

The direction of $\nabla f$ is the direction of greatest increase of $f$. (This can be shown by writing out the directional derivative of $f$ using the chain rule, and comparing the result with a dot product of the direction vector with the gradient vector.) You want to go toward the direction of greatest decrease, so move along $-\nabla f$.

  • 0
    O$k$ay I figured it out. Thanks !!2012-09-07
1

enter image description here

But still, why is it MINUS?

Because your goal is to MINIMIZE J(θ).

enter image description here

So, in the maximization problem, you need to ADD alpha * slope.

0

My understanding of this minus sign is about the assumption of SGD. The assumption is that the objective function $J$ is a convex function where has the optimal solution (global, local) at $\theta_{*}$ where the partial derivatives are $0$, so that's why the parameters are updated by moving to the reverse direction of function "changing faster", because SGD wants $J$ changes slower and slower and gradually hits the convex point.

  • 1
    Welcome to Mathematics Stack Exchange community! The quick tour (https://math.stackexchange.com/tour) will help you get the most benefit from your time here. Also, please use MathJax for your equations. My favorite reference is https://math.meta.stackexchange.com/questions/5020/mathjax-basic-tutorial-and-quick-reference.2018-12-27