2
$\begingroup$

Is there a way to update a normal distribution when given new data points without knowing the original data points? What is the minimum information that would need to be known? For example, if I know the mean, standard deviation, and the number of original data points, but not the values of those points themselves, is it possible?

  • 0
    Do you mean updating using Bayes' Theorem?2012-12-04
  • 1
    Maybe? I'm asking the question because I don't know. Let's say I had a list of 100 integers between 1 and 10, and they have a mean of 7.3 and a stddev of 1.1. Now I'm given a new number, say 8. Of course if I still have the 100 integers, I now have 101 and can recalculate the mean and stddev, but what I'm wondering is if I can do it without knowing the 100 integers, just knowing the mean and stddev and that there were 100. Can Bayes Theorem do this?2012-12-04
  • 2
    The minimal information is just what you've listed: you need the zero-th, first, and second moments of the original data points; e.g., knowing their number, mean, and standard deviation is sufficient.2012-12-04

1 Answers 1

5

It is certainly possible. The best way, avoiding some numerical precision issues, is to track the following two values, using the new $n$th observation $a_n$ each time to update the following:

$$m_n = m_{n-1} + \frac{a_{n}-m_{n-1}}{n}$$

$$s_n = s_{n-1} + (a_n - m_{n-1})(a_n - m_n)$$

starting with $m_0=s_0 =0$. Then the mean of the first $n$ values is $m_n$ while the standard deviation is $\sqrt{\frac{s_n}{n}}$ or $\sqrt{\frac{s_n}{n-1}}$ depending on what denominator you usually use to calculate the standard deviation. If you would prefer to just track the standard deviation you can calculate $s_{n-1}=(n-1)\sigma_{n-1}^2 \text{ or } (n-2)\sigma_{n-1}^2 $ each time.

  • 0
    Did you mean $a_n-m_n$ in the expression for $s_n$?2012-12-05
  • 0
    @Alex - yes - thanks2012-12-05