6
$\begingroup$

Possible Duplicate:
Why do we use a Least Squares fit?

To find the normal equations for derivation of the regression line we use the method of least squares, We want to make the error smallest for each individual,so we may take summation of mod of the error term for each individual instead of taking the summation of the square of the errors of the individuals. Why is the former method used for regression and not the second one?

4 Answers 4

6

It is not true that min sum of absolute errors is never used. Least squares is used because it is equivalent to maximum likelihood when the model residuals are normally distributed with mean 0. But when the distribution of the error term is non-normal particularly if it has heavy tails the least squares estimates are not best and robust procedures such as minimum sum of absolute errors are preferable.

3

The real answer is that in many cases the least sum of squares of errors has a simple optimal solution whereas other loss functions have optimal solutions with a complicated form that need to be approximated numerically.

  • 0
    @HagenvonEitzen Considering that some fields settled on their standard statistical methods back when all calculations were done by hand, I find it plausible that ease of calculation was a sufficient reason.2012-09-03
2

Because we prefer many small errors over one big error.

Physically, this is because a small amount of noise is present in any data, so a small deviation everywhere is only to be expected.

So technically, a regression analysis gives you a model of whatever model you already had, plus some noise on all data.

A big error at a specific point, however, indicates that the noiseless model does not really fit the data, since big errors are not typical for noise. This is why we fit against the assumption that big errors are unlikely.

  • 0
    Mathematically, an error is defined as the absolute value of the difference, and errors never cancel out.2012-09-03
2

In addition to all the reasons given in the other answers, consider that in comparison to the sum of absolute values of errors, the sum of squared errors gives greater weight to large errors and less weight to small errors. This is because $x^2 > |x| ~~ \text{if} ~~ 1 < |x| < \infty,\\ x^2 < |x| ~~ \text{if} ~~ 0 < |x| < 1.$ This affects the fitting of the regression line in ways that some might consider more desirable.