I'm using Python here, but even if you're not a Python expert you may be able to help.
I have a family of curves (cubic splines fitted to data) that look like periodic functions, and I'd like to shift each curve linearly along the x axis so as to minimize the sum of the squared distances between each curve and the mean curve along a specified region.
In other words, for a family of n curves $f_i(x)$ and mean curve $g(x)$, I want to find a vector of $\delta$ values that minimizes the following over region $(a,b)$:
$ \sum_{i=1}^n \int_a^b (f_i(x+\delta_i) - g(x))^2 dx $
In practice, n is around 50. Since transforming curves also alters the mean curve somewhat, this is repeated iteratively until no more shifts are required beyond a certain threshold.
I'm using scipy.optimize.minimize to attempt to minimize the following function:
lambda delta: sum([scipy.integrate.quad(lambda x: (f(x+d) - g(x)) ** 2, *day_region)[0] for f, d in zip(fs, delta)])
where
g = lambda x: sum(f(x) for f in fs) / len(fs)
fs is a list of scipy.interpolate.LSQUnivariateSpline objects. They have an "integrate" method that can be used to compute a definite integral quickly, but at the moment I'm not taking advantage of it. After profiling, it seems that the number of times the function to be minimized is called is causing it to take a very long time.
Is there a smarter way to do this to make it faster, or some way to rework this analytically so that it's easier to solve?