2
$\begingroup$

My question today is about the minimization of an error function with two parameters. It is a function that measures the error of a set of points. The two parameters are the weights of a regressor.

$$\frac{1}{N}\sum_{t=1}^{N}[r^t-(w_1x^t+w_0)]^2$$

The minimum should be calculated by taking partial derivates of the error function above with respect to $w_1$ and $w_0$. Setting them equal to $0$ and solving for the unknown. However I didn't reach the solutions given. The solutions should be:
$$w_1=\frac{\sum_tx^tr^t-\sum_t\frac{x^t}{N}\sum_t\frac{r^t}{N}N}{\sum_t(x^t)^2-N(\sum_t\frac{x^t}{N})^2}$$
$$w_0=\sum_t\frac{r^t}{N}-w_1\sum^t\frac{x^t}{N}$$

They are performing well in practice. But my question is, can I reach them by taking the partial derivatives and setting them equal to $0$? Can anybody help me, at least with one? Thank you.

UPDATE:
This is the regressor I get by using the $w_1$ and $w_0$ listed above. As you can see, the two model the data very well so they must be right. enter image description here

UPDATE 2:
I will post the passage from the book that lists $w_1$ and $w_0$ as the solution. Maybe you'll get the idea better.
enter image description here

  • 1
    Your most recent edit has the correct expressions for $w_0$ and $w_1$ (and they're the same as the expressions in Henry's answer).2012-03-14
  • 0
    @MDCCXXIX Yes, thanks. They are the same, right :) I guess the reasion my graphic performed well and your graphic of my solutions didn't was because of a typo. Thanks a lot2012-03-14
  • 0
    @MDCCXXIX I was successfully at doing it myself. Thank you guys once again (Good luck with OCaml :))2012-03-14

2 Answers 2

2

Let's start by ignoring the constant $\frac{1}{N}$. Then

$$\sum_t[r^t-(w_1x^t+w_0)]^2 $$ $$= \sum_t r^{2t} + w_1^2 \sum_t x^{2t} +N w_0^2 -2 w_1 \sum_t r^t x^t -2 w_0 \sum_t r^t+ 2 w_1 w_0 \sum_t x^t $$

Take the partial derivatives with respect to $w_0$ and $w_1$ and set them to zero and you get

$$ 2N w_0 -2 \sum_t r^t + 2 w_1 \sum_t x^t = 0,$$ $$ 2 w_1 \sum_t x^{2t} -2 \sum_t r^t x^t + 2 w_0 \sum_t x^t = 0.$$

The first of these gives your expression for $w_0$. Solving these simultaneous equations I think gives $$w_0 = \dfrac{(\sum_t r^t) (\sum_t x^{2t}) - (\sum_t r^t x^t)(\sum_t x^t) }{ N (\sum_t x^{2t}) -(\sum_t x^{t})^2 },$$ $$w_1=\dfrac{ N (\sum_t r^t x^t) - (\sum_t r^t)(\sum_t x^{t}) }{ N (\sum_t x^{2t}) -(\sum_t x^{t})^2 }$$

and this last sadly does not look close enough to your quoted expression for $w_1$.

I may have made an error, but letting $r=1$ I would expect the optimal values to be $w_0=1$ and $w_1=0$ to minimise the original expression. Mine seem to do that, while the one you quote for $w_1$ does not. I think there may also be dimensional issues with the expression you quote for $w_1$.

  • 0
    it doesn't seem to do. Neither the result I got from differentiating works. It's very strange. That is the error function and this is the way to do it. I tested it on a training set of samples and yours and mine don't model the data very well, while the $w_1$ and $w_0$ I listed above do.2012-03-14
  • 0
    Are you sure? I just plotted some data and best fit lines using the formulas in your question and those in Henry's answer, and the formula in Henry's answer fits the data much better. Also, I checked his calculations, and they're correct.2012-03-14
  • 0
    @Henry Thank you a lot2012-03-14
1

I concur with the results in Henry's answer. In case you want to try it out for yourself, here's some Matlab code implementing the two solutions (note that due to Matlab's indexing rules, $w_0$ is w(1) and $w_1$ is w(2) etc.

% Create a fake dataset

x = linspace(2,8,30)';
r = 0.5 + 0.1 * x + 0.1 * randn(30,1);

% Formulas from your question

w(2) = (sum(x.*r) - mean(x)*mean(r)) / (sum(x.^2) - sum(x));
w(1) = mean(r) - w(2) * mean(x);

% Formulas from Henry's answer

v(1) = (mean(x.^2) * mean(r) - mean(x) * mean(r.*x)) / (mean(x.^2) - mean(x)^2);
v(2) = (mean(r.*x) - mean(r)*mean(x)) / (mean(x.^2) - mean(x)^2);

% Plot the data

plot(x,r,'xr')
hold on
xlabel('House size')
ylabel('Price')

% Plot your best fit line (green)

plot([2 8], [1 2;1 8] * w', 'g')

% Plot Henry's best fit line (blue)

plot([2 8], [1 2;1 8] * v', 'b')

This should result in the following plot:

enter image description here

  • 0
    Well, let me try once again in Matlab and I'll come back with an answer :) I'm using a house size/house price data set.2012-03-14
  • 0
    Oh, I see what you did. It work best in this case but it's not how I use it. Every point on that line is the result of the linear regression function $g(x)=w_1*x+w_0$, where $w_1$ and $w_0$ are called the "weights" of the regressor. I guess this is why we get different results. Am I right?2012-03-14
  • 0
    I posted the page from the book that gives the solution2012-03-14