I've used gradient descent algorithm for some time now. But I only had a intuitive understanding of it. Never learned it in a mathematics course. The gradient descent rule I know is:
$$ x_k = x_{k-1} - \frac{1}{\eta_k}\Delta f(x_{k-1}) $$
which I intuitively understand as moving against the slope to reach the local minimum. However, recently I came across the following definition:
$$ x_k = argmin <\Delta f(x_{k-1}),x> + \frac{\eta_k}{2}||x - x_{k-1}||_2^2 $$
I cannot understand the relationship between the two. Nor can I understand the second equation intuitively. You choose $x$ such that it minimize the distance from the previous $x$ value and the directional derivative? That makes little sense to me.
Please help.