1
$\begingroup$

So I have a data set $(x_{1},y_{1}), (x_{2},y_{2}),\dots,(x_{n},y_{n})$ and from it I have the values of $\sum x$, $\sum x^{2}$, $\sum y$, $\sum y^{2}$, $\sum xy$.

My question is, how do I find a normal distribution that best fits this data set and how do I use these values to calculate the standard deviation for the normal distribution?

Basically, given a data set, how do I find the values of the mean and standard deviation for the normal distribution of best fit? Are they the same as the mean of the data set?

  • 0
    @Hans Engler: I did some searching and found this, which might answer the question ([http://www.math.uri.edu/~pakula/452webs8/regress.pdf](http://www.math.uri.edu/~pakula/452webs8/regress.pdf)) I am not sure exactly how to apply it though.2012-12-26

2 Answers 2

1

You need also $\sum x y$, otherwise you would exclude all the normal distributions where there is dependence between $X$ and $Y$.

The normal distribution that best fits the data is obtained by maximum likelihood estimation. It is the one that has the mean and covariance matrix equal to the empirical mean and empirical covariance matrix corresponding your sums (normalized by $n$).

  • 0
    @RickyT Do you know matrix algebra?2012-12-27
1

You have the sufficient statistics for $\mu_X, \mu_Y, \sigma^2_X$ and $\sigma^2_Y$ so you can calculate their estimates directly using $ \bar{x} = \frac{1}{n}\sum_{i = 1}^n x_i, \,\,\, \bar{y} = \frac{1}{n}\sum_{i = 1}^n y_i $ for the sample means and $ s^2_x= \frac{1}{n-1} \sum_{i=1}^n\left(x_i - \bar{x} \right)^ 2 = \frac{\sum_{i=1}^nx_i^2}{n-1} - \frac{n\bar{x}^2}{n-1} \\ s^2_y= \frac{1}{n-1} \sum_{i=1}^n\left(y_i - \bar{y} \right)^ 2 = \frac{\sum_{i=1}^ny_i^2}{n-1} - \frac{n\bar{y}^2}{n-1} $ for the sample variances. As others have mentioned, without $\sum{xy}$ you will not be able to estimate the covariance between $X$ and $Y$, which the regression tag in your question suggests you want.

  • 0
    If you have $\sum xy$ you can estimate the covariance between $X$ and $Y$ like this: $\tfrac{1}{n - 1} \sum xy - \bar{y} \sum x - \bar{x} \sum y + \bar{x}\bar{y}$. The sample means are the center of your estimated two-dimensional normal distribution.2012-12-30