0
$\begingroup$

I have a simple best-fit-line algorithm similar to this description.

Without memorizing the points history, it is easy to calculate a rolling best fit line as long as we remember (store) the intermediate values used to calculate the BFL:

sumX,sumY           //  the sum sumX2, sumY2,       //  sum of squares sumXY, and count    //  sum of (X*Y), and count 

when a new point arrives (Xn,Yn), the line simply update the sums by adding the latest value based on Xn, Yn. then calculate BFL:

XMean = SumX / Count YMean = SumY / Count Slope = (SumXY - SumX * YMean) / (SumX2 - SumX * XMean) YInt = YMean - Slope * XMean 

to get

 Y = Slope * X + YInt 

However, as the number of points grow, the new point will have less of an effect on the best-fit-line.

is there a way to mathematically reduce this weight? for example when the count reaches 20, modify the intermediate variables to represent a line with a weight of 10 points:

when count = 20 ->      count = count/2     sumX = sumX/2    // this will result in the same XMean, but the same line?     sumY = sumY/2    // same YMean here as well 

and the values for sumXY,sumX2,sumY2 are even more perplexing to me.

Specifically, I'm looking for equations for these intermediate values to result to the same line but with less weight.

-TIA

  • 0
    @GerryMyerson that would mean re-doubling the updates every 20 points which would result in overflow. Since simply dividing the intermediate variables (Sum$X$,SumY,...) by a half would distort the line, and not being able to carry "sliding window" data, what I ended up doing was to use two sets of intermediate variables (a+b). when 'a' reaches 10 points, copy it to 'b' and zero 'a'. That way the BFL based on the combination of 'a' and 'b' will always represent the latest 10-19 points.2012-11-21

0 Answers 0