1
$\begingroup$

I'm using Python here, but even if you're not a Python expert you may be able to help.

I have a family of curves (cubic splines fitted to data) that look like periodic functions, and I'd like to shift each curve linearly along the x axis so as to minimize the sum of the squared distances between each curve and the mean curve along a specified region.

In other words, for a family of n curves $f_i(x)$ and mean curve $g(x)$, I want to find a vector of $\delta$ values that minimizes the following over region $(a,b)$:

$ \sum_{i=1}^n \int_a^b (f_i(x+\delta_i) - g(x))^2 dx $

In practice, n is around 50. Since transforming curves also alters the mean curve somewhat, this is repeated iteratively until no more shifts are required beyond a certain threshold.

I'm using scipy.optimize.minimize to attempt to minimize the following function:

lambda delta: sum([scipy.integrate.quad(lambda x: (f(x+d) - g(x)) ** 2, *day_region)[0] for f, d in zip(fs, delta)]) 

where

g = lambda x: sum(f(x) for f in fs) / len(fs) 

fs is a list of scipy.interpolate.LSQUnivariateSpline objects. They have an "integrate" method that can be used to compute a definite integral quickly, but at the moment I'm not taking advantage of it. After profiling, it seems that the number of times the function to be minimized is called is causing it to take a very long time.

Is there a smarter way to do this to make it faster, or some way to rework this analytically so that it's easier to solve?

  • 0
    That is possible, but the function is continuous beyond the first and last piece of the spline (where it will approach -inf or inf) and the squared deviations increase rapidly outside the range of the data. Delta could also be bounded to stay within the useful portion of the splines.2012-10-16

1 Answers 1

1

Given the fact that your spline is cubic, I believe you could use the scipy.min_l_bfgsb function for the minimization. This function requires that you provide both the value of your objective function and its gradient with respect to all the variables $\Delta = [\delta_i]_i$ at for any value of $\Delta$. However, I must warn you that the computation of this gradient with the formulas I provide is quadratic in $n$. For the rest, we define $\mathcal{F}$ as your objective function, such that $ \mathcal{F}(\Delta) = \sum_{i=1}^n \int_a^b (f_i(x+\delta_i) - g(x))^2 \mathrm{d}x $

Value:

I assume that you defined $ g(x) = \frac{1}{n}\sum_{i=0}^n f_i(x-\delta_i). $

Computing the value of your objective function then sums up to the computation of all the integrals $ \int_a^b f_i(x-\delta_i)\mathrm{d}x, $ which you can do using the provided “integrate” function of your splines, by shifting $a$ and $b$ using $\delta_i$

Gradient:

If I didn't mess up, the gradient of your objective function with respect to the variable $\delta_i$ is $ \begin{aligned} \frac{\partial\mathcal{F}}{\partial\delta_i}(\Delta) &= \int_a^b 2(f_i(x+\delta_i)-g(x)) \frac{\partial f_i}{\partial x}(x+\delta_i) \mathrm{d}x\\ &-\frac{1}{n} \sum_{j=0}^n \int_a^b 2(f_i(x+\delta_i)-g(x)) \frac{\partial f_j}{\partial x}(x+\delta_j) \mathrm{d}x. \end{aligned} $ Since is where things get nasty since you have to compute integrals for all the pairs $(f_i,\tfrac{\partial f_j}{\partial x})_{i,j}$. From the scipy documentation it seems like the splines also provide a method to compute their derivative at any point. I believe you could then compute these integrals numerically. Since your spline is cubic, there might be a way to compute the exact integral using quadrature rules, but this would require a fine study of your splines. I believe that when you define your spline, you know where the polynomial pieces meet, since you provide a knot vector. So by splitting the [a,b] segment at every knot and using Gauss-Legendre quadrature points for polynomials of degree 3 on every segment you obtain by the splitting, you could also compute an exact integral.

By feeding the value and gradient of the objective function to the lbfgs solver, you may converge quadratically to one of its local minima.