61
$\begingroup$

How can I compute the standard deviation in an incremental way (using the new value and the last computed mean and/or std deviation) ?

for the non incremental way, I just do something like: $$S_N=\sqrt{\frac1N\sum_{i=1}^N(x_i-\overline{x})^2}.$$

mean = Mean(list) for i = 0 to list.size    stdev = stdev + (list[i] - mean)^2 stdev = sqrRoot( stdev / list.size ) 

8 Answers 8

48

I think the easiest way to do this is with an orthogonality trick. I'll show how to incrementally compute the variance instead of the standard deviation. Let $X_1, X_2, ...$ be an iid sequence of random variables with $\bar X = n^{-1} \sum_{j = 1} ^ n X_j$ and $s^2_n$ defined similarly as the $n$'th sample variance (I use a denominator of $n-1$ instead of $n$ in your picture to keep things unbiased for the variance, but you can use the same argument just adding $1$ to all the terms with $n$ in them). First write $$ s^2_n = \frac{\sum_{j = 1} ^ n (X_j - \bar X_n)^2}{n - 1} = \frac{\sum_{j = 1} ^ n (X_j - \bar X_{n - 1} + \bar X_{n - 1} - \bar X_n)^2}{n - 1}. $$ Expand this to get $$ s^2_n = \frac{(n - 2)s^2_{n - 1} + (n - 1) (\bar X_{n - 1} - \bar X_n)^2 + 2 \sum_{j = 1} ^ {n - 1} (X_j - \bar X_{n - 1})(\bar X_{n - 1} - \bar X_n) + (X_n - \bar X_{n})^2}{n - 1} $$ and it is easy to show that the summation term above is equal to $0$ which gives $$ s^2_n = \frac{(n - 2)s^2_{n - 1} + (n - 1)(\bar X_{n - 1} - \bar X_n)^2 + (X_n - \bar X_{n})^2}{n - 1}. $$

EDIT: I assumed you already have an incremental expression for the sample mean. It is much easier to get that: $\bar X_n = n^{-1}[X_n + (n-1)\bar X_{n-1}]$.

13

The standard deviation is a function of the totals $T_{\alpha}=\sum_{i=1}^{N}x_{i}^{\alpha}$ for $\alpha=0,1,2$, each of which can be calculated incrementally in an obvious way. In particular, $E[X]=T_{1}/T_{0}$ and $E[X^2]=T_{2}/T_{0}$, and the standard deviation is $ \sigma = \sqrt{\text{Var}[X]} = \sqrt{E[X^2]-E[X]^2} = \frac{1}{T_0}\sqrt{T_{0}T_{2}-T_{1}^2}. $ By maintaining totals of higher powers ($T_{\alpha}$ for $\alpha \ge 3$), you can derive similar "incremental" expressions for the skewness, kurtosis, and so on.

  • 1
    @Phil: the formula I gave, $\sqrt{E[X^2]-E[X]^2}$, is the population standard deviation. Remember that the sample standard deviation differs from this by a factor of $\sqrt{n/(n-1)}$. To get the sample standard deviation in your case you would multiply by $\sqrt{3/2}$, giving the value of $1$ that you were expecting.2018-10-22
8

What you refer to as an incremental computation is very close to the computer scientist's notion of an online algorithm. There is in fact a well-known online algorithm for computing the variance and thus square root of a sequence of data, documented here.

6

Another way of saying the above is that you need to keep a count, an incremental sum of values and an incremental sum of squares.
Let $N$ be the count of values seen so far, $S = \sum_{1,N} x_i$ and $Q = \sum_{1,N} x_i^2$ (where both $S$ and $Q$ are maintained incrementally). Then any stage,
the mean is $\frac{S}{N}$
and the variance is $\frac{Q}{N} - \left( \frac{S}{N}\right)^2$
An advantage of this method is that it is less prone to rounding errors after a long series of calculations.

  • 0
    indeed (let $X=S/N$ be mean): $s^2=1/N\sum_{i=0}^n (x_i-X)^2 = 1/N\sum_{i=0}^n (x_i^2 - 2x_iX+X^2)=1/N\sum_{i=0}^n x_i^2-1/N\sum_{i=0}^n 2x_iX+1/N\sum_{i=0}^n X^2=Q/N-2X^2+X^2=Q/N-X^2=Q/N-(S/N)^2$2017-04-26
6

Forgive my poor math background, what I need is detail!

I added my progress here for someone like me.

$ s_n^2=\frac {\sum_{i=1}^{n}(x_i-\bar{x}_n)^2}{n-1} \\ = \frac {\sum_{i=1}^n(x_i - \bar{x}_{n-1} + \bar{x}_{n-1} - \bar{x}_n)^2}{n-1} \\ = \frac {\sum_{i=1}^{n}(x_i - \bar{x}_{n-1})^2 + 2\sum_{i=1}^n(x_i - \bar{x}_{n-1})(\bar{x}_{n-1} - \bar{x}_n) + \sum_{i=1}^n(\bar{x}_{n-1} - \bar{x}_n)^2} {n-1} \\ = \frac {(\sum_{i=1}^{n-1}(x_i - \bar{x}_{n-1})^2 + (x_n - \bar{x}_{n-1})^2) + (2\sum_{i=1}^{n-1}(x_i - \bar{x}_{n-1})(\bar{x}_{n-1} - \bar{x}_n) + 2(x_n - \bar{x}_{n-1})(\bar{x}_{n-1} - \bar{x}_n)) + \sum_{i=1}^n(\bar{x}_{n-1} - \bar{x}_n)^2} {n-1} \\ = \frac {(n-2)s_{n-1}^2 + (x_n - \bar{x}_{n-1})^2 + 0 + 2(x_n - \bar{x}_{n-1})(\bar{x}_{n-1} - \bar{x}_n)) + n(\bar{x}_{n-1} - \bar{x}_n)^2} {n-1} \\ = \frac {(n-2)s_{n-1}^2 + x_n^2 - 2x_n\bar{x}_{n-1} + \bar{x}_{n-1}^2 + 2x_n \bar{x}_{n-1} - 2x_n \bar{x}_n - 2\bar{x}_{n-1}^2 + 2\bar{x}_{n-1}\bar{x}_n + n\bar{x}_{n-1}^2 - 2n\bar{x}_{n-1}\bar{x}_n + n\bar{x}_n^2} {n-1} \\ = \frac {(n-2)s_{n-1}^2 + x_n^2 + \bar{x}_{n-1}^2 - 2x_n \bar{x}_n - 2\bar{x}_{n-1}^2 + 2\bar{x}_{n-1}\bar{x}_n + n\bar{x}_{n-1}^2 - 2n\bar{x}_{n-1}\bar{x}_n + n\bar{x}_n^2} {n-1} \\ = \frac {(n-2)s_{n-1}^2 + x_n^2 - 2x_n\bar{x}_{n-1} + \bar{x}_{n-1}^2 + 2x_n \bar{x}_{n-1} - 2x_n \bar{x}_n - 2\bar{x}_{n-1}^2 + 2\bar{x}_{n-1}\bar{x}_n + n\bar{x}_{n-1}^2 - 2n\bar{x}_{n-1}\bar{x}_n + n\bar{x}_n^2} {n-1} \\ = \frac {(n-2)s_{n-1}^2 + (x_n^2 - 2x_n\bar{x}_n + \bar{x}_n^2) + (n-1)(\bar{x}_{n-1}^2 - 2\bar{x}_{n-1}\bar{x}_n + \bar{x}_n^2)} {n-1} \\ = \frac {(n-2)s_{n-1}^2 + (n-1)(\bar{x}_{n-1} - \bar{x}_n)^2 + (x_n - \bar{x}_n)^2} {n-1} \\ = \frac {n-2}{n-1}s_{n-1}^2 + (\bar{x}_{n-1} - \bar{x}_n)^2 + \frac {(x_n - \bar{x}_n)^2}{n-1} $

and

$ (\bar{x}_{n-1} - \bar{x}_n)^2 + \frac {(x_n - \bar{x}_n)^2}{n-1} \\ = (\bar{x}_{n-1} - \frac {x_n + (n-1)\bar{x}_{n-1}}{n})^2 + \frac {(x_n - \frac {x_n + (n-1)\bar{x}_{n-1}}{n})^2}{n-1} \\ = \frac {1}{n} (x_n - \bar{x}_{n-1})^2 $

so

$ s_n^2 = \frac{n-2}{n-1}s_{n-1}^2 + \frac{1}{n}(x_n - \bar{x}_{n-1})^2 $

  • 0
    Awesome! Thanks!2018-02-24
2

If it is of any help I found what seems to be a much nicer way here wikipedia - online algorithm .

The following is how I have implemented it in java. It has work very well for me. Hope it is clear enough, Please feel free to ask me questions about it.

/**  * standardDeviation() - designed to calculate the standard deviation of a data set incrementally by taking the last entered value and the previous sum of differences to the mean recorded.  * (i.e; upon adding a value to the data set this function should immediately be called)  *   * NOTE: do not call this function if the data set size it less than 2 since standard deviation cannot be calculated on a single value  * NOTE: sum_avg, sum_sd and avg are all static variables  * NOTE: on attempting to use this on another set following previous use, the static values will have to be reset**  *   * @param vector - List - data with only one additional value from previous method call  * @return updated value for the Standard deviation  */ public static double standardDeviation(List vector) {        double N = (double) vector.size();                  //size of the data set     double oldavg = avg;                                //save the old average     avg = updateAverage(vector);                        //update the new average      if(N==2.0)                                          //if there are only two, we calculate the standard deviation using the standard formula      {                                                                        for(double d:vector)                            //cycle through the 2 elements of the data set - there is no need to use a loop here, the set is quite small to just do manually         {             sum_sd += Math.pow((Math.abs(d)-avg), 2);   //sum the following according to the formula         }     }     else if(N>2)                                        //once we have calculated the base sum_std       {            double newel = (vector.get(vector.size()-1));   //get the latest addition to the data set          sum_sd = sum_sd + (newel - oldavg)*(newel-avg); //https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Online_algorithm     }     return Math.sqrt((sum_sd)/(N));                     //N or N-1 depends on your choice of sample of population standard deviation  }  /**  * simplistic method for incrementally calculating the mean of a data set  *   * @param vector - List - data with only one additional value from previous method call  * @return updated value for the mean of the given data set  */ public static double updateAverage(List vector) {     if(vector.size()==2){         sum_avg = vector.get(vector.size()-1) + vector.get(vector.size()-2);     }     else{         sum_avg += vector.get(vector.size()-1);     }       return sum_avg/(double)vector.size();  }