54
$\begingroup$

How can I compute the standard deviation in an incremental way (using the new value and the last computed mean and/or std deviation) ?

for the non incremental way, I just do something like:

http://upload.wikimedia.org/wikipedia/en/math/c/c/7/cc79944cf31b14406acd0a8ec8498688.png

mean = Mean(list) for i = 0 to list.size    stdev = stdev + (list[i] - mean)^2 stdev = sqrRoot( stdev / list.size ) 

6 Answers 6

43

I think the easiest way to do this is with an orthogonality trick. I'll show how to incrementally compute the variance instead of the standard deviation. Let $X_1, X_2, ...$ be an iid sequence of random variables with $\bar X = n^{-1} \sum_{j = 1} ^ n X_j$ and $s^2_n$ defined similarly as the $n$'th sample variance (I use a denominator of $n-1$ instead of $n$ in your picture to keep things unbiased for the variance, but you can use the same argument just adding $1$ to all the terms with $n$ in them). First write $$ s^2_n = \frac{\sum_{j = 1} ^ n (X_j - \bar X_n)^2}{n - 1} = \frac{\sum_{j = 1} ^ n (X_j - \bar X_{n - 1} + \bar X_{n - 1} + \bar X_n)^2}{n - 1}. $$ Expand this to get $$ s^2_n = \frac{(n - 2)s^2_{n - 1} + (n - 1) (\bar X_{n - 1} - \bar X_n)^2 + 2 \sum_{j = 1} ^ {n - 1} (X_j - \bar X_{n - 1})(\bar X_{n - 1} - \bar X_n) + (X_n - \bar X_{n} ^ 2)}{n - 1} $$ and it is easy to show that the summation term above is equal to $0$ which gives $$ s^2_n = \frac{(n - 2)s^2_{n - 1} + (n - 1)(\bar X_{n - 1} - \bar X_n)^2 + (X_n - \bar X_{n})^2}{n - 1}. $$

EDIT: I assumed you already have an incremental expression for the sample mean. It is much easier to get that: $\bar X_n = n^{-1}[X_n + (n-1)\bar X_{n-1}]$.

  • 19
    Nice answer. Note that your formula can be simplified to $$s_n^2={n-2\over n-1}s^2_{n-1}+{1\over n}(X_n-\bar X_{n-1})^2.$$2012-01-27
  • 1
    Thanks Guy for the great explaination (2). it is exactly what i was looking for. but can you please elaborate how the summation term equals zero as i seem to be missing a trick...2013-01-07
  • 1
    @user55490: $$ \sum_{j = 1} ^ {n - 1} (X_j - \bar X_{n - 1}) = (sum_{j = 1} ^ {n - 1} (X_j) - (n - 1) \bar X_{n - 1}) = (n - 1) (\bar X_{n - 1} - \bar X_{n - 1})$$2015-10-26
  • 2
    See also [this wikipedia post](https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Online_algorithm)2017-02-01
  • 3
    Please, note that the simplified formula of @user940 can only be used if we are computing the sample variance. If we want to use a custom DDOF (Delta Degrees Of Freedom), the formula becomes: $$s^2_n = \frac{(n - 1 - d)s^2_{n - 1} + (n - 1)(\bar X_{n - 1} - \bar X_n)^2 + (X_n - \bar X_{n})^2}{n - d}$$ Where $d$ is the DDOF, usually $d = 0$ for population variance and $d = 1$ for sample variance.2018-02-24
10

The standard deviation is a function of the totals $T_{\alpha}=\sum_{i=1}^{N}x_{i}^{\alpha}$ for $\alpha=0,1,2$, each of which can be calculated incrementally in an obvious way. In particular, $E[X]=T_{1}/T_{0}$ and $E[X^2]=T_{2}/T_{0}$, and the standard deviation is $$ \sigma = \sqrt{\text{Var}[X]} = \sqrt{E[X^2]-E[X]^2} = \frac{1}{T_0}\sqrt{T_{0}T_{2}-T_{1}^2}. $$ By maintaining totals of higher powers ($T_{\alpha}$ for $\alpha \ge 3$), you can derive similar "incremental" expressions for the skewness, kurtosis, and so on.

  • 1
    I think the @user wanted a formula that lets you incrementally calculate the standard deviation as you increase the data set in a way that is computationally efficient, not this.2012-01-27
  • 1
    This is actually computationally efficient, because you just need to count the number of samples, the sum of the samples, and the sum of the squares of each sample. Once you need to "read" the standard deviation (or average,) you calculate the square root, which costs two multuplies, one subtraction, one square root, and one divide. On most modern CPUs, this is highly efficient (and very cache friendly.)2014-08-01
  • 0
    Furthermore, this methods allows to compute easily a running standard deviation as well, which is what i was looking for. in addition to adding the new value to the sum of values and squares, remove the oldest value from them as well. Sorry for the late comment, but I found this page after googling this issue and I hope it will let future people find this more easily.2016-02-11
  • 0
    I have tried to implement this in R. However, I seem to am doing something wrong as I don't get correct standard deviation values. E.g. for the three x values 1, 2, and 3 the standard deviation should be 1. T0 = 3 (1^0 + 2^0 + 3^0), T1 = 6 (1^1 + 2^1 + 3^1), T2 = 14 (1^2 + 2^2 + 3^2). The final standard deviation value according to the formula above would be 0.8165 which is different from 1. Could you tell me where I am making a mistake?2018-10-18
  • 1
    @Phil: the formula I gave, $\sqrt{E[X^2]-E[X]^2}$, is the population standard deviation. Remember that the sample standard deviation differs from this by a factor of $\sqrt{n/(n-1)}$. To get the sample standard deviation in your case you would multiply by $\sqrt{3/2}$, giving the value of $1$ that you were expecting.2018-10-22
7

What you refer to as an incremental computation is very close to the computer scientist's notion of an online algorithm. There is in fact a well-known online algorithm for computing the variance and thus square root of a sequence of data, documented here.

5

Another way of saying the above is that you need to keep a count, an incremental sum of values and an incremental sum of squares.
Let $N$ be the count of values seen so far, $S = \sum_{1,N} x_i$ and $Q = \sum_{1,N} x_i^2$ (where both $S$ and $Q$ are maintained incrementally). Then any stage,
the mean is $\frac{S}{N}$
and the variance is $\frac{Q}{N} - \left( \frac{S}{N}\right)^2$
An advantage of this method is that it is less prone to rounding errors after a long series of calculations.

  • 0
    This is the best way to achieve the goal - single pass calculate with fixed amount of storage required. But is it mathematically valid?2016-08-25
  • 0
    indeed (let $X=S/N$ be mean): $s^2=1/N\sum_{i=0}^n (x_i-X)^2 = 1/N\sum_{i=0}^n (x_i^2 - 2x_iX+X^2)=1/N\sum_{i=0}^n x_i^2-1/N\sum_{i=0}^n 2x_iX+1/N\sum_{i=0}^n X^2=Q/N-2X^2+X^2=Q/N-X^2=Q/N-(S/N)^2$2017-04-26
4

Forgive my poor math background, what I need is detail!

I added my progress here for someone like me.

$$ s_n^2=\frac {\sum_{i=1}^{n}(x_i-\bar{x}_n)^2}{n-1} \\ = \frac {\sum_{i=1}^n(x_i - \bar{x}_{n-1} + \bar{x}_{n-1} - \bar{x}_n)^2}{n-1} \\ = \frac {\sum_{i=1}^{n}(x_i - \bar{x}_{n-1})^2 + 2\sum_{i=1}^n(x_i - \bar{x}_{n-1})(\bar{x}_{n-1} - \bar{x}_n) + \sum_{i=1}^n(\bar{x}_{n-1} - \bar{x}_n)^2} {n-1} \\ = \frac {(\sum_{i=1}^{n-1}(x_i - \bar{x}_{n-1})^2 + (x_n - \bar{x}_{n-1})^2) + (2\sum_{i=1}^{n-1}(x_i - \bar{x}_{n-1})(\bar{x}_{n-1} - \bar{x}_n) + 2(x_n - \bar{x}_{n-1})(\bar{x}_{n-1} - \bar{x}_n)) + \sum_{i=1}^n(\bar{x}_{n-1} - \bar{x}_n)^2} {n-1} \\ = \frac {(n-2)s_{n-1}^2 + (x_n - \bar{x}_{n-1})^2 + 0 + 2(x_n - \bar{x}_{n-1})(\bar{x}_{n-1} - \bar{x}_n)) + n(\bar{x}_{n-1} - \bar{x}_n)^2} {n-1} \\ = \frac {(n-2)s_{n-1}^2 + x_n^2 - 2x_n\bar{x}_{n-1} + \bar{x}_{n-1}^2 + 2x_n \bar{x}_{n-1} - 2x_n \bar{x}_n - 2\bar{x}_{n-1}^2 + 2\bar{x}_{n-1}\bar{x}_n + n\bar{x}_{n-1}^2 - 2n\bar{x}_{n-1}\bar{x}_n + n\bar{x}_n^2} {n-1} \\ = \frac {(n-2)s_{n-1}^2 + x_n^2 + \bar{x}_{n-1}^2 - 2x_n \bar{x}_n - 2\bar{x}_{n-1}^2 + 2\bar{x}_{n-1}\bar{x}_n + n\bar{x}_{n-1}^2 - 2n\bar{x}_{n-1}\bar{x}_n + n\bar{x}_n^2} {n-1} \\ = \frac {(n-2)s_{n-1}^2 + x_n^2 - 2x_n\bar{x}_{n-1} + \bar{x}_{n-1}^2 + 2x_n \bar{x}_{n-1} - 2x_n \bar{x}_n - 2\bar{x}_{n-1}^2 + 2\bar{x}_{n-1}\bar{x}_n + n\bar{x}_{n-1}^2 - 2n\bar{x}_{n-1}\bar{x}_n + n\bar{x}_n^2} {n-1} \\ = \frac {(n-2)s_{n-1}^2 + (x_n^2 - 2x_n\bar{x}_n + \bar{x}_n^2) + (n-1)(\bar{x}_{n-1}^2 - 2\bar{x}_{n-1}\bar{x}_n + \bar{x}_n^2)} {n-1} \\ = \frac {(n-2)s_{n-1}^2 + (n-1)(\bar{x}_{n-1} - \bar{x}_n)^2 + (x_n - \bar{x}_n)^2} {n-1} \\ = \frac {n-2}{n-1}s_{n-1}^2 + (\bar{x}_{n-1} - \bar{x}_n)^2 + \frac {(x_n - \bar{x}_n)^2}{n-1} $$

and

$$ (\bar{x}_{n-1} - \bar{x}_n)^2 + \frac {(x_n - \bar{x}_n)^2}{n-1} \\ = (\bar{x}_{n-1} - \frac {x_n + (n-1)\bar{x}_{n-1}}{n})^2 + \frac {(x_n - \frac {x_n + (n-1)\bar{x}_{n-1}}{n})^2}{n-1} \\ = \frac {1}{n} (x_n - \bar{x}_{n-1})^2 $$

so

$$ s_n^2 = \frac{n-2}{n-1}s_{n-1}^2 + \frac{1}{n}(x_n - \bar{x}_{n-1})^2 $$

  • 0
    Awesome! Thanks!2018-02-24
1

If it is of any help I found what seems to be a much nicer way here wikipedia - online algorithm .

The following is how I have implemented it in java. It has work very well for me. Hope it is clear enough, Please feel free to ask me questions about it.

/**  * standardDeviation() - designed to calculate the standard deviation of a data set incrementally by taking the last entered value and the previous sum of differences to the mean recorded.  * (i.e; upon adding a value to the data set this function should immediately be called)  *   * NOTE: do not call this function if the data set size it less than 2 since standard deviation cannot be calculated on a single value  * NOTE: sum_avg, sum_sd and avg are all static variables  * NOTE: on attempting to use this on another set following previous use, the static values will have to be reset**  *   * @param vector - List - data with only one additional value from previous method call  * @return updated value for the Standard deviation  */ public static double standardDeviation(List vector) {        double N = (double) vector.size();                  //size of the data set     double oldavg = avg;                                //save the old average     avg = updateAverage(vector);                        //update the new average      if(N==2.0)                                          //if there are only two, we calculate the standard deviation using the standard formula      {                                                                        for(double d:vector)                            //cycle through the 2 elements of the data set - there is no need to use a loop here, the set is quite small to just do manually         {             sum_sd += Math.pow((Math.abs(d)-avg), 2);   //sum the following according to the formula         }     }     else if(N>2)                                        //once we have calculated the base sum_std       {            double newel = (vector.get(vector.size()-1));   //get the latest addition to the data set          sum_sd = sum_sd + (newel - oldavg)*(newel-avg); //https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Online_algorithm     }     return Math.sqrt((sum_sd)/(N));                     //N or N-1 depends on your choice of sample of population standard deviation  }  /**  * simplistic method for incrementally calculating the mean of a data set  *   * @param vector - List - data with only one additional value from previous method call  * @return updated value for the mean of the given data set  */ public static double updateAverage(List vector) {     if(vector.size()==2){         sum_avg = vector.get(vector.size()-1) + vector.get(vector.size()-2);     }     else{         sum_avg += vector.get(vector.size()-1);     }       return sum_avg/(double)vector.size();  }