1
$\begingroup$

imagine my function as a staircase with two steps. This function is to be fitted to some empirical data and I'm searching for an algorithm which minimizes the Root Mean Squared Error between this function and the data. The decision variables are L1 and L2 (for the level of the steps on the y-axis) and C (for the cut-off-point between the steps on the x-axis). The possible values for C on the x-axis are discreete and finite. Can you point me to algorithms or even possible solutions in Excel (VBA), Matlab or C++? The standard Excel solver is unable to solve this problem, because it requires a smooth function, but the (expensive) Premium solver seems able to do it.

My suggestion for an algorithm would be following 2-step method: Given the finite set of values for C, for each C, optimize L1 and L2. Then, optimum C* would the C with the lowest RMSE given L1* and L2*. Would that be correct? Or is there a more elegant way to it?

Any helpful comments are appreciated. Steve

  • 0
    with respect to your 1st comment: You are right, only one change of value, sorry for the confusion. $2$nd comment: correct. $3$d comment: sorry again. 4th comment: I realize that for a given C, I can evaluate L1* and L2* independently from one another. That's what I meant.2011-04-05

1 Answers 1

1

The example $\{(-1,-1),(-\epsilon,-\epsilon),(\epsilon,\epsilon),(1,1)\}$ shows that there can be more than one local minimum with respect to $C$: Splitting 1 & 3 either way is better than splitting 2 & 2. So it seems you can't avoid trying out every position for the step.

To find the optimal $C$ efficiently, note that the optimal $L_1$ and $L_2$ given $C$ will be the average of the data on each side of $C$, so that what you're trying to minimize is

$ \begin{eqnarray} && \sum_{x_i

where $N_<$ and $N_\ge$ are the numbers of points to the left and right of $C$, respectively. The first term is constant, so you can ignore it, so all you need to do is sweep through from left to right, in each step adding one function value to the left-hand sum and subtracting it from the right-hand sum and evaluating the expression in parentheses.

  • 0
    @Steve: Your last comment makes we wonder whether you've understood my answer. This should be a matter of milliseconds, not hours. Most importantly, $L1$ and $L2$ should not be fitted to the data iteratively -- as I wrote in the answer, they are simply the averages of the function values to the left and to the right of the split point, respectively, and even these averages don't have to be recalculated for each potential split point. I recommend that you try to implement my answer; I'll be happy to help if you have questions about it.2011-04-06