2
$\begingroup$

Using the algorithm described here to calculate the variance while data streams, I want to compute a moving variance, which like in the case of moving average will consider older data less important.

for convinence here the algorithm from the wiki page :

def weighted_incremental_variance(dataWeightPairs):
    wSum = 0
    mean = 0
    S = 0

    for x, w in dataWeightPairs:  # Alternatively "for x, w in zip(data, weights):"
        wSum = wSum + w
        meanOld = mean
        mean = meanOld + (w / wSum) * (x - meanOld)
        S = S + w * (x - meanOld) * (x - mean)

    variance = S / wSum

altering this algorithm for moving mean is easy (i think) just divide wSum by some factor>1 (the larger the more forgetfull it becomes). but given this factor, how would i fix S to act accordingly?

I could divide it by the same factor as well but that doesnt feel right...

1 Answers 1

1

To put this into symbolic terms, you want to compute a uniformly weighted moving average of $n$ previous terms by

$y(i) =c_n\sum_{j=0}^{n-1} w^j x_{i-j} $, where $c_n$ is chosen so that $c_n\sum_{j=0}^{n-1} w^j =1 $.

This is to ensure that the weighted average of a constant series returns the same value.

Getting $c_n$ is straightforward.

If $w=1$, $c_n = 1/n$.

If $w \ne 1$, $\sum_{j=0}^{n-1} w^j =\dfrac{1-w^n}{1-w} $ so $c_n =\dfrac{1-w}{1-w^n} $.

To do this efficiently, we want to get $y(i+1)$ in terms of $y(i)$. (What follows is fairly standard subscript manipulation.)

$\begin{array}\\ y(i+1) &=c_n\sum_{j=0}^{n-1} w^j x_{i+1-j}\\ &=c_n(x_{i+1}+\sum_{j=1}^{n-1} w^j x_{i+1-j})\\ &=c_n(x_{i+1}+\sum_{j=0}^{n-2} w^{j+1} x_{i-j})\\ &=c_n(x_{i+1}+\sum_{j=0}^{n-1} w^{j+1} x_{i-j}-w^{n}x_{i-n+1})\\ &=c_n(x_{i+1}+w\sum_{j=0}^{n-1} w^{j} x_{i-j}-w^{n}x_{i-n+1})\\ &=c_n(x_{i+1}+wy(i)/c_n-w^{n}x_{i-n+1})\\ &=wy(i)+c_n(x_{i+1}-w^{n}x_{i-n+1})\\ \end{array} $

While this does require that the last $n+1$ $x_i$ values need to be stored (a circular buffer will do this quite nicely), the computation takes only a constant time instead of time $\Theta(n)$ by the straightforward values.

Note that to do this efficiently like this requires that the weights be of the form $w^j$. If the weights are arbitrary, the full computation has to be done each time.

It would be an interesting problem to determine the most general weights such that $y(i+1)$ could be computed from $y(i)$ in constant time independent of $n$.

I think that I'll propose this as a problem.

  • 0
    nice detailed answer, though Im not sure we are on the same page, i was looking for moving variance, not moving average...2017-02-03
  • 0
    Once you get the successive values, you can use an online method, such as Kahan's, to get the variance.2017-02-03