1
$\begingroup$

Given a set of 5 points (i.e. (1, 3), (2, 8) etc...), how can I get just the slope of the best fit line?

I've been looking up least squares regression, but I'm rather statistics ignorant and don't understand most of the terminology and math behind it. Can anyone explain it a bit more simply?

  • 1
    It's a bit hard to construct a good answer because one has to guess more simply _than what_. Could you perhaps point to one of the resources you fail to understand, such as to give an upper bound for the level of answer you desire? It would also help if you could sketch, in a few sentences, the most advanced facts you _already know_ about the problem, such that we don't have to waste focus explaining that again unnecessarily. For example, do you know what distinguishes a least-squares fit from other possible fits and why it's the one you want?2011-11-15

3 Answers 3

5

By best-fit line, I presume you mean the least-squares fit. The "least-squares fit line" for the given data $\{ (x_i, y_i) \}_{i=1}^n$ is, by definition, simply the line $\ell_{a,b}$ with the equation $y = a+bx$ that minimizes the least-square error: $ Q(a,b) := \sum_{i=1}^n (y_i - a - bx_i)^2. $ Notice that the quantity $|y_i - a- bx_i|$ is a measure of the deviation of the point $(x_i, y_i)$ from the line. Squared error refers to the fact that we are summing (over the $n$ data points) the sum of squares of these deviations from the line. [Another reasonable choice could be to minimize the sum of errors $\sum\limits_{i=1}^n \ |y_i - a - bx_i|$, but least squares has the advantage that it is easy to compute the minimizer analytically*.]

To find the line $\ell_{a,b}$ that minimizes $Q$, we resort to calculus. Taking partial derivatives of $Q$ w.r.t. $a$ and $b$, we get: $ \begin{eqnarray*} \frac{\partial Q}{\partial a} &=& \sum_{i=1}^n 2 (a + bx_i - y_i) = 2an + 2b \sum_i x_i - 2\sum_i y_i. \\ \frac{\partial Q}{\partial b} &=& \sum_{i=1}^n 2 (a + bx_i - y_i) x_i = 2a \sum_i x_i + 2b \sum_i x_i^2 - 2\sum_i x_i y_i. \end{eqnarray*} $ Setting both the partial derivatives to $0$, you can solve for $a$ and $b$.


*EDIT: Added the qualification analytically. See the comments under guy's answer for more on this.

  • 0
    Thank you this helps a lot!2011-11-17
1

If you have $(x_1, y_1), ..., (x_n, y_n)$ and all you care to do is fit a straight line to the data (ignoring any actual sorts of statistics) a reasonable thing to do is minimize $\sum_{i = 1} ^ n (y_i - \alpha - \beta x_i)^2$ with respect to $\alpha$ and $\beta$, with solution say $\hat \alpha$ and $\hat \beta$; the line $\ell(x) = \hat \alpha + \hat \beta x$ is the line corresponding to your so-called "least squares" fit. Why is this reasonable? Well, the value $y_i - \alpha - \beta x_i$ is the amount that our line has missed the value of $y_i$ by. We would like to construct a line that, in general, doesn't miss by much. So we aim to minimize the above sum. We square this difference to make things positive since we don't want negative and positive misses to cancel each other out; we care about the magnitude of the miss, not the sign. We also could have minimized $\sum |y_i - \alpha - \beta x_i|$ - this gives a different fit with somewhat different properties, but in general this fit is more difficult to do since we can't differentiate $|\cdot|$.

The slope of this line is the value $\hat \beta$. To get $\hat \beta$ set $\bar y = n^{-1} \sum y_i$, and $\bar x = n^{-1} \sum x_i$. It turns out that the value of $\hat \beta$ is given by $\frac{\sum (x_i - \bar x)(y_i - \bar y)}{\sum (x_i - \bar x)^2}$

which you can get by differentiating $f(\alpha, \beta) = \sum (y_i - \alpha - \beta x_i)^2$ with respect to $\alpha$ and $\beta$.

  • 2
    I agree. It seemed by "difficulty of the fit" you were referring to the actual computation needed. I think the main reason least squares has such a firm place in the classroom is that there is such incredibly rich theory associated with it.2011-11-16
0

Well, you asked for a "simpler" description for the regression in the context of least-squares-fit. I find this a simpler one, although the basic ingredience might be more difficult than the usual approach: thinking in terms of n dimensions, where n is the number of observations, i.e. questioned people or in your case n=5 by the given five points.

If this can be imagined without too much hassle, then the following should also be a "simpler" explanation.

The variables are then vectors from the origin into that n-dimensional space, and we have from your example the points (1,3)(2,8), and extend this to for instance (4,5)(5,7)(3,6) to have n=5 points.
Then we can rewrite this as the vector X pointing to (1,2,4,5,3) in that five dimensions and Y pointing to (3,8,5,7,6).
Also we define the "mean-vector" $\small M(k) $ pointing to any coordinate $\small k \cdot (1,1,1,1,1)$ on the (multidimensional) diagonal.

Clearly $\small M((1+2+4+5+3)/5)=M(3)=3 \cdot (1,1,1,1,1) $ is the vector pointing to the mean of the coordinates of X and $\small M((3+8+5+7+6)/5)=M(29/5)=5.8 \cdot (1,1,1,1,1) $ is the vector pointing to the mean of the coordinates of Y.
Note, that by this the point M(3) is the point on the diagonal which is nearest to the tip of X : but to express the distance of two points we simply need the pythagorean formula with their coordinates - and come up exactly with the least-squares-criterion. This is also true for the distance of M(5.8) from the tip of Y . Just as a spinoff of that model we "see" immediately that the "means" are the best approximates to a set of values in the sense of least-squares.

The same model can now be used to express the linear regression: we want to compose the mean M(.) and the X-vector in such a way that we come nearest to the tip of Y. Or said differently, we want to find a vector $\small \hat Y = a \cdot M(3)+b \cdot X = M(3 a)+b \cdot X $ with a and b to be found such that the tip of $\small \hat Y $ is nearest to the tip of Y.
That's linear regression in the sense of least-squares-approximation.