Is there a way to update a normal distribution when given new data points without knowing the original data points? What is the minimum information that would need to be known? For example, if I know the mean, standard deviation, and the number of original data points, but not the values of those points themselves, is it possible?
Iteratively Updating a Normal Distribution
2
$\begingroup$
statistics
normal-distribution
-
0Do you mean updating using Bayes' Theorem? – 2012-12-04
-
1Maybe? I'm asking the question because I don't know. Let's say I had a list of 100 integers between 1 and 10, and they have a mean of 7.3 and a stddev of 1.1. Now I'm given a new number, say 8. Of course if I still have the 100 integers, I now have 101 and can recalculate the mean and stddev, but what I'm wondering is if I can do it without knowing the 100 integers, just knowing the mean and stddev and that there were 100. Can Bayes Theorem do this? – 2012-12-04
-
2The minimal information is just what you've listed: you need the zero-th, first, and second moments of the original data points; e.g., knowing their number, mean, and standard deviation is sufficient. – 2012-12-04
1 Answers
5
It is certainly possible. The best way, avoiding some numerical precision issues, is to track the following two values, using the new $n$th observation $a_n$ each time to update the following:
$$m_n = m_{n-1} + \frac{a_{n}-m_{n-1}}{n}$$
$$s_n = s_{n-1} + (a_n - m_{n-1})(a_n - m_n)$$
starting with $m_0=s_0 =0$. Then the mean of the first $n$ values is $m_n$ while the standard deviation is $\sqrt{\frac{s_n}{n}}$ or $\sqrt{\frac{s_n}{n-1}}$ depending on what denominator you usually use to calculate the standard deviation. If you would prefer to just track the standard deviation you can calculate $s_{n-1}=(n-1)\sigma_{n-1}^2 \text{ or } (n-2)\sigma_{n-1}^2 $ each time.
-
0Did you mean $a_n-m_n$ in the expression for $s_n$? – 2012-12-05
-
0@Alex - yes - thanks – 2012-12-05