3
$\begingroup$

I am asking about the so-called approximate gradient descent algorithm for solving the convex problem $$ \min_x f(x),$$ whose $(k+1)$th iterate has the form: $$ x_{k+1} = x_k - \gamma \tilde{\nabla}f(x_k),$$ where $\tilde{\nabla}f(x_k)$ is an approximation of $\nabla f(x_k)$ and $\gamma$ is some sufficiently small, positive step size parameter.

Is it possible for one to prove (formally) that $$f(x_{k+1}) \leq f(x_k),$$ or more generally, that the approximate gradient descent "works". If so, how? Note that I am not using a stochastic approximation to the gradient, so probabilistic arguments are not useful in this case.

Any directions, either to relevant papers or thoughts on how one could proceed with a proof (i.e., conditions on $\gamma$ or $\epsilon(x_k) = \tilde\nabla f(x_k) - \nabla f(x_k)$) would be helpful.

  • 0
    You might try to tackle this from the standpoint of Lyapunov stability theory. I. e., you may consider $x_{k+1} = x_k - \gamma \tilde{\nabla}f(x_k)$ as a discrete-time dynamical system.2017-01-13
  • 0
    Could you elaborate any or point to some relevant reading (at or near an intro-level)?2017-01-13

1 Answers 1

1

One option is to consider the dynamical system

$$ e_k = x_k - x^*, $$

where $x^*$ is the extremum.

Then, $e_{k+1} = e_k - \alpha \nabla f(x_k)$.

Take a function $V(e_k) = V_k := e_k^2$.

Then,

$$ V_{k+1} - V_k = \alpha^2 ( \nabla f(x_k) )^2 - 2 \alpha e_k \nabla f(x_k) $$

If it is possible to show that $V_{k+1} - V_k < 0, \forall e \ne 0$, then it is called a Lyapunov function for the system and in turn $e_k$ converges asymptotically to the equilibrium $e = 0$ whence $x = x^*$.

Observe that $e_k < 0 \implies \nabla f(x_k) < 0$ and $e_k > 0 \implies \nabla f(x_k) > 0$. So, if $\alpha$ is to be taken as positive, the condition

$$ \alpha_k < 2\frac{|e_k|}{|\nabla f(x_k)|}, e_k = 0 \implies \alpha_k := 0 $$

is a possible choice of an adaptive gain $\alpha_k$ which renders the equilibrium asymptotically stable.

If the gradient is approximate, then a condition that its sign coincides with the sign of $e_k$ suffices for the argument to work. Otherwise, you need to define the region where this condition holds. Its complement is the attractor. Indeed, if in some region the gradient changes sign arbitrarily, then no stability can be shown.