3
$\begingroup$

I am writing a computer program that searches for the minimum of a multivariate function $f: \mathbb{R}^n \to \mathbb{R}$. This function is in fact the sum of many functions:

$f(x) = \sum_{i=1}^m f_i(x)$

The computation starts from some guess $x_0$ of a minimum point, and at each iteration $j$ it improves the current guess $x_j$ into a better one $x_{j+1}$ (using some variant of Newton's method or steepest descent). As $x_j$ approaches the minimum, the gradient becomes closer to 0. The problem is, that to compute the gradient, I need to sum up many gradients, like this

$\nabla f = \sum_{i=1}^m \nabla f_i$

and the $\nabla f_i$'s are nonzero. The fact that their sum is close to zero means that if we compute this sum in floating point arithmetics, we get a big accumulated error and the computation is very inaccurate. When I try to approximate the gradient by evaluating $f$ at different points near $x_j$, the approximate gradient comes out indeed quite different from the computed gradient. (In comparison, at points where $\nabla f$ is far from zero, the computed and approximate gradients are very close).

Does anyone know of a good method to compute the gradient near extremal points?

  • 1
    There are techniques for accurately computing the sum of many floating-point numbers without too much cancellation error; see Higham's "[Accurate and Efficient Floating Point Summation](http://www.maths.manchester.ac.uk/~nareports/narep198.pdf)". (I presume you're using double precision already, of course -- if not, forget everything and do that first.)2013-06-07

3 Answers 3

1

There are basically three mathods to provide the gradients to an optimization algorithm:

  1. You provide a function gradF(x1,x2...) in which you implement the analytic form of the gradient function.

  2. You use finite-differences to numerically approximate the gradient.

  3. You can use symbolic (automatic) differentiation (i.e. to generate the gradient function implementation given the implementation of your objective function).

Also, you have to understand the behaviour of your function and you have to adjust the termination conditions such as function tolerance (i.e. $f(x_i)-f(x_{i-1})$, where $i$ is the step number) and the value tolerance (i.e. $x_i-x_{i-1}$).

  • 0
    use value tolarance $x_i - x_{i-1}$ where $i$ is the step number, as a termination criteria2012-09-24
0

If you have a computer program that already computes the values of $f_i$, and you need to compute derivatives, I would strongly avoid finite difference methods, if possible.

Instead, I would look into using Automatic Differentiation. AD is a tool that allows one to compute the derivative of computer code with respect to its inputs, without needed to do so through finite differencing. There are free, open-source tools to do so; for instance, TAPENADE.

  • 0
    But the OP **said** that (s)he coded analytic derivatives.2014-05-08
0

Your question isn't an optimization question, but rather an IEEE arithmetic question. You want to sum terms in the appropriate order so as to minimize cancellation errors. Look in any introductory book or course on scientific computing. Check the chapter on finite-precision arithmetic.

Edit: I just saw Rahul's comment, which is the only sensical comment on this page.