0
$\begingroup$

Assuming I have a huge data set and the only attribute I know about it is the Variance (or SD since SD = $\sqrt{\text{Variance}}$). What conclusions can I make about the set with reasonable certainty?

I don't know the mean, median, mode, presence of outliers or any other information.

Can I make any conclusions beyond How much the values deviate from the mean?

  • 1
    See the chebyshev inequality.2012-07-17

1 Answers 1

1

The Chebyshev inequality tells you how much probability content is guaranteed to be within k standard deviations of the mean for any distribution with a finite variance. The bound ie 1-1/k$^2$. So for k <=1 to 1 nothing is guaranteed. for k=2 it guarantees 0.75 and for k=3, 0.89 approximately (actually 8/9). This can be contrasted to the normal distribution which includes probability 0.68 within 1 standard deviation and 0.954 for 2 standard deviations and more 0.9973 for 3 standard deviations. Chebyshev will always give less than or equal to the actual probability of any particular distribution because it has to hold for every distribution with finite variance.