1
$\begingroup$

In gradient descent algorithms (especially when talking about multivariate regression), when they are talking about the gradient at a given point, sometimes I find the notation: $\nabla f(x_n)$ and sometimes it is the notation: $\frac{\partial}{\partial x_n} f$

For example:

  • Wikipedia says: $x_{n+1} = x_n - \alpha \nabla f(x_n) $
  • Andrew Ng (on Coursera) says: $\theta_j = \theta_j - \alpha \frac{\partial}{\partial \theta_j} J (\theta)$

So I was wondering if the two are the same, or if there differences when we should use one notation over the other.

  • 0
    In a Hilbert space, there is a correspondence between linear functionals and points in the space. The gradient is the point corresponding to the linear functional $h \mapsto {\partial f(x) \over \partial x} h$.2017-02-13
  • 0
    I'm just starting this, so for someone who doesn't know (yet) what a Hilbet space is, can I consider that these two notations are the same?2017-02-13
  • 2
    If you are working in $\mathbb{R}^n$, then think of ${\partial f(x) \over \partial x}$ as a row vector and $\nabla f(x) = {\partial f(x) \over \partial x}^T$ as a column vector.2017-02-13
  • 0
    Yes, I'm working in Rn. So can I assume then that the values of these vectors are the same? Since they are the transpose of each other?2017-02-13
  • 0
    Yes. In $\mathbb{R}^n$ they are just transposes of each other.2017-02-13
  • 1
    Thank you very much, it's way clearer now (I've spent hours on this).2017-02-13
  • 0
    The 'advantage' of the gradient is that you can think of it as a point in the same space as '$x$'. Linear functionals are a little less concrete to think about.2017-02-13
  • 0
    But for optimization methods, do I need linear functionals or the gradient information is enough?2017-02-13
  • 1
    They contain exactly the same information in $\mathbb{R}^n$.2017-02-13

1 Answers 1

1

$\nabla f(x_n)$ does NOT mean $\partial f/\partial x_n$. In the context of the wikipedia article on gradient descent, $x_n$ is a just a point in (say) $\mathbb{R}^3$. For example $x_n$ could be the point (1,0,2).

$\nabla f(x_n)$ is the vector $(\frac{\partial f}{\partial x} (x_n),\frac{\partial f}{\partial y} (x_n),\frac{\partial f}{\partial z} (x_n))$.

$\partial f/\partial x_n$ implies that you are labelling the coordinates $x_1, x_2,$ etc. Maybe $x_n$ happens to correspond to the coordinate $z$ in the 3d case, then this would be $\partial f/\partial z$ (which you would evaluate at some particular point).

Basically, I think the wikipedia article is using vector notation and Ng is using coordinate notation, which are essentially the same (I'm not sure especially since there's a mistake in your given example). But the real issue is what you stated in your first line.

  • 0
    Thank you very much, I was confusing between evaluating the gradient at some point, and getting the gradient parameters. I just did my equations and now Wikipedia's notation and Ng's match up.2017-02-13
  • 0
    Follow up question: Say $x$ is a vector. then what is the difference between $\frac{\partial f}{\partial x}$ and $\nabla_x f$? Does this change anything?2017-05-18
  • 1
    @TokeFaurby In that context they mean exactly the same thing.2017-05-18