4
$\begingroup$

I was looking up how standard deviations work, and tried to do it by hand, but what I'm getting isn't matching up with what I have in SQL Server.

My population is: 1,2,3,4,5,6,7,8,9. I end up with 2.58 using the method described on wikipedia. However, SQL Server AND Microsoft Excel come up with 2.73.

The final part of my equation looks like this: sqrt(60/9)

So, am I messing something up? If so, where?

  • 3
    The SQL Server and Excel seem to use the estimator with Bessel's correction: $$\sqrt{\frac{1}{N-1} \sum_{i=1}^N (x_i - \bar x)^2 } .$$ See [here](http://en.wikipedia.org/wiki/Bessel%27s_correction).2011-10-10
  • 0
    You have spotted an important thing and should watch the videos explaining the need for Bessel correction, https://www.khanacademy.org/math/probability/descriptive-statistics/variance_std_deviation/v/review-and-intuition-why-we-divide-by-n-1-for-the-unbiased-sample-variance2013-11-20
  • 0
    You have spotted an important thing and should watch the videos explaining the need for Bessel correction, https://www.khanacademy.org/math/probability/descriptive-statistics/variance_std_deviation/v/review-and-intuition-why-we-divide-by-n-1-for-the-unbiased-sample-variance. Also, note that there are two std_dev functions in Excel: `stdev.p` and `stdev.s`. The latter involves the Bessel correction, which I think you should do if you want to estimate the population variance using the sample variance.2013-11-20

1 Answers 1

6

If you look closely at the Wikipedia article that you quoted, you will see that you computed either what they call the "population standard deviation" or what they call the "standard deviation of the sample," and not what they call the "sample standard deviation!" To find the sample standard deviation, we need to divide $60$ by $8$, not $9$, and then take the square root. With your numbers, that turns out to be about $2.7386$.

The terminological distinction that Wikipedia makes between "standard deviation of the sample" and "sample standard deviation" is not universal. But let's deal first with population standard deviation. If $X$ is a random variable with mean $\mu$, then the variance of $X$ is equal to $$E(X-\mu)^2.$$ In particular, if the random variable $X$ has uniform distribution on the set $\{1,2,\dots,9\}$, then the variance of $X$ is indeed the $60/9$ that you calculated.

With samples, the situation is more complicated. There are two commonly used estimators of the population variance. If we have a sample $X_1,X_2, \dots, X_n$ of size $n$, then one standard estimator for the population variance is $$\frac{1}{n}\sum_{i=1}^n (X_i-\overline{X})^2$$ and the other is $$\frac{1}{n-1}\sum_{i=1}^n (X_i-\overline{X})^2,$$ where in each case $\overline{X}$ is the sample mean $\frac{1}{n}\sum_{i=1}^n X_i$. There are technical reasons in favour of each. The second one is an unbiased estimator of the population variance, when that exists, and is more commonly preferred. (The situation is more complicated than that. Although the first estimator is on average wrong, it is on average, in the mean-square sense, closer to the truth! And in general, the square root of neither of our expressions is an unbiased estimator of the population standard deviation.)

When $n$ is large, there is not much difference between the two estimators, so we don't need to pay attention to what version is being used. But for $n=9$, it does make a significant difference, and we need to be aware of what formula a piece of software uses.

  • 0
    Just to elaborate on the different standard deviations, the first is what we want to think of as the standard deviation. For the second, imagine you have a large population from which you are sampling (e.g. you are surveying 1000 people in a city of a million). In that case, you don't care some much about the sample, but rather you want to use the sample to infer things about the total population. Unbiased estimators of statistics (the second std dev being one) are meant to estimate properties of the total population from the sample. Use this to decide which std dev you want to use.2011-10-10
  • 1
    ["We might also comment that if the difference between $N$ and $N-1$ ever matters to you, then you are probably up to no good anyway — e.g., trying to substantiate a questionable hypothesis with marginal data."](http://books.google.com/books?id=DyykEZo4fwUC&pg=PA722)2011-10-10
  • 1
    @André: I think you mistakenly refer to "what they call the standard deviation of the sample, not the sample standard deviation". Wikipedia refers to the "sample standard deviation" (with Bessel's correction) and the _population_ standard deviation. The question above uses the word "population"!2011-10-11
  • 1
    It is misleading to say the unbiased estimator is "generally preferred". That seems to imply it's generally better. That is quite questionable. The estimator with the smallest mean squared error, if we assume an i.i.d. sample from a normal population, is $\frac{1}{n+1}\sum_{i=1}^n (X_i-\bar{X})^2$ (not a typo: $n+1$, not $n-1$) and that is even more biased than the biased estimator you mentioned. And in some instances an insistance on unbiasedness is disastrous.2011-10-11
  • 0
    @Michael Hardy: Thank you for the comments. I went back to the Wikipedia article, and quoted the three versions that it mentions, "population standard deviation," "standard deviation of the sample," and "sample standard deviation."2011-10-11
  • 0
    @André: You wrote: "The terminological distinction that Wikipedia makes between 'standard deviation of the sample' and 'sample standard deviation' is not universal." I find no such distinction being made in the Wikipedia article. Where do you find it?2011-10-11
  • 0
    @Michael Hardy: It is easy to find in the [Article]{http://en.wikipedia.org/wiki/Standard_deviation#Estimation) that the OP referred to, Section 3. It really is there! That is why in my editing of my answer, I used quotation marks.2011-10-11
  • 0
    So it is. The terminology makes sense but for the fact that the terms are so similar to each other that it seems predictable that confusion will ensue.2011-10-12
  • 0
    @Michael Hardy: It sounds like something one would make up to be funny. (And I would have invented it, if I had a sense of humor.)2011-10-12