1
$\begingroup$

I have $\bar{x}_i$ for $ 1\leq i\leq m$ as the estimated mean of each of $m$ samples with size $n$ of a random variable $X$. I want to estimate the value of $\sigma_\bar{X}$ with this information. I guess that the correct estimate is:

$$\hat\sigma_\bar{X}=\sqrt{\frac{\sum_{i=1}^m(\bar{x_i}-\bar{\bar{x}_i})^2}{m-1}}.$$

I would like to know if it's correct and if not the method for find the correct estimation.

( I haven't access to the $mn$ original values)

  • 1
    You mention estimating the standard deviation $\sigma_{\bar X}$ but your displayed equation was for an estimate of $\sigma_{\bar X}^2,$ so I edited your typesetting. Please re-edit or leave me a Comment, if this is not now what you intended.2017-02-22

2 Answers 2

1

There is no such thing as a correct estimator, although sometimes there are optimal ones according to certain criteria.

Preaching aside, you can just think of your successive measurements of the sample mean as samples of the random variable $\bar X_n.$ The estimator you wrote down is a very popular estimator of the standard deviation for $m$ samples of a random variable, so it's probably not a bad choice, given that the fairly weak assumptions that make this estimator typically have good properties hold.

If the individual random variables comprising the groups of size $n$ are known to be independent and you have all the data (not just the sample means) you could also just do the sample variance of the whole batch and use the fact that $Var(\bar X) = Var(X)/n.$

1

This is a reasonable way to estimate the standard deviation $\sigma_{\bar X}$ of the $m$ means $\bar X_i$. This method works especially well for normal data. Even if the original observations ($n$ of them used to find each $\bar X_i$) are not exactly normal, then by the Central Limit Theorem, the means may be nearly normal. Generally speaking, the larger $n$ is, the more nearly normal the means.

But there are some exceptions. If the original observations were exponential (e.g., waiting times or reaction times) then the $\bar X_i$ would have a gamma distribution, not a normal distribution. Also, if the original observations were Poisson (e.g., counts of rare events) then the quantities $n\bar X_i$ would also be Poisson. In both of these situations one can find better ways to estimate the variance or standard deviation, based on the means (rather than on the formula you give).

So knowing the nature of the underlying distribution might lead you to better ways of estimating (either estimates aimed more accurately at the right target value or estimates with more precise performance hitting close of the target value with less scatter.)

Also, if you have the $mn$ original values, you could test whether the variance stays the same from one batch (of $n$ observations) to another. Moreover, there might be better ways to estimate the underlying variability by using all $mn$ values in a slightly different way.

Therefore, finally, (1) if you have reason to believe that the $mn$ original values might not be (at least close to) normally distributed (especially with very small $n$) or (2) if you still have access to these original values, you should describe the circumstances more precisely, and maybe one of us can recommend a better way to estimate $\sigma.$