1
$\begingroup$

I've used gradient descent algorithm for some time now. But I only had a intuitive understanding of it. Never learned it in a mathematics course. The gradient descent rule I know is:

$$ x_k = x_{k-1} - \frac{1}{\eta_k}\Delta f(x_{k-1}) $$

which I intuitively understand as moving against the slope to reach the local minimum. However, recently I came across the following definition:

$$ x_k = argmin <\Delta f(x_{k-1}),x> + \frac{\eta_k}{2}||x - x_{k-1}||_2^2 $$

I cannot understand the relationship between the two. Nor can I understand the second equation intuitively. You choose $x$ such that it minimize the distance from the previous $x$ value and the directional derivative? That makes little sense to me.

Please help.

1 Answers 1

0

I believe what you are trying to write is a form of generalized gradient descent. See also this one.

Normally, gradient descent is written: $$ x_{t+1} = x_t - \alpha \nabla f(x_t) $$ Recall that our goal is to find $\tilde{x}=\arg\min_x f(x)$. So, let's locally approximate $f$ with its Taylor series: \begin{align} f(x) &= f(a) + \nabla f(a)^T[ x-a] + o(||x-a||) \\ &\approx f(a) + \nabla f(a)^T[ x-a] + \gamma||x-a|| \end{align} Now just use this Taylor approximation to minimize $f$ locally: \begin{align} x_{t+1} &= \arg\min_z \underbrace{f(x_t)}_{\text{No $z$}} + \nabla f(x_t)^T[z-x_t] + \gamma||z-x_t||\\ &= \arg\min_z \nabla f(x_t)^T[z-x_t] + \gamma||z-x_t|| \end{align} which is the form for which we are looking.

Intuitively, the first term determines the direction of the step, while the second regularizes it. Essentially, you want to balance being close to $x_t$ (i.e. using the step-size constant $\gamma$) and following the negative gradient direction (lower projection of the step direction $z-x_t$ onto the gradient).