2
$\begingroup$

My question today is about the minimization of an error function with two parameters. It is a function that measures the error of a set of points. The two parameters are the weights of a regressor.

$\frac{1}{N}\sum_{t=1}^{N}[r^t-(w_1x^t+w_0)]^2$

The minimum should be calculated by taking partial derivates of the error function above with respect to $w_1$ and $w_0$. Setting them equal to $0$ and solving for the unknown. However I didn't reach the solutions given. The solutions should be:
$w_1=\frac{\sum_tx^tr^t-\sum_t\frac{x^t}{N}\sum_t\frac{r^t}{N}N}{\sum_t(x^t)^2-N(\sum_t\frac{x^t}{N})^2}$
$w_0=\sum_t\frac{r^t}{N}-w_1\sum^t\frac{x^t}{N}$

They are performing well in practice. But my question is, can I reach them by taking the partial derivatives and setting them equal to $0$? Can anybody help me, at least with one? Thank you.

UPDATE:
This is the regressor I get by using the $w_1$ and $w_0$ listed above. As you can see, the two model the data very well so they must be right. enter image description here

UPDATE 2:
I will post the passage from the book that lists $w_1$ and $w_0$ as the solution. Maybe you'll get the idea better.
enter image description here

  • 0
    @MDCCXXIX I was successfully at doing it myself. Thank you guys once again (Good luck with OCaml :))2012-03-14

2 Answers 2

2

Let's start by ignoring the constant $\frac{1}{N}$. Then

$\sum_t[r^t-(w_1x^t+w_0)]^2 $ $= \sum_t r^{2t} + w_1^2 \sum_t x^{2t} +N w_0^2 -2 w_1 \sum_t r^t x^t -2 w_0 \sum_t r^t+ 2 w_1 w_0 \sum_t x^t $

Take the partial derivatives with respect to $w_0$ and $w_1$ and set them to zero and you get

$ 2N w_0 -2 \sum_t r^t + 2 w_1 \sum_t x^t = 0,$ $ 2 w_1 \sum_t x^{2t} -2 \sum_t r^t x^t + 2 w_0 \sum_t x^t = 0.$

The first of these gives your expression for $w_0$. Solving these simultaneous equations I think gives $w_0 = \dfrac{(\sum_t r^t) (\sum_t x^{2t}) - (\sum_t r^t x^t)(\sum_t x^t) }{ N (\sum_t x^{2t}) -(\sum_t x^{t})^2 },$ $w_1=\dfrac{ N (\sum_t r^t x^t) - (\sum_t r^t)(\sum_t x^{t}) }{ N (\sum_t x^{2t}) -(\sum_t x^{t})^2 }$

and this last sadly does not look close enough to your quoted expression for $w_1$.

I may have made an error, but letting $r=1$ I would expect the optimal values to be $w_0=1$ and $w_1=0$ to minimise the original expression. Mine seem to do that, while the one you quote for $w_1$ does not. I think there may also be dimensional issues with the expression you quote for $w_1$.

  • 0
    @Henry Thank you a lot2012-03-14
1

I concur with the results in Henry's answer. In case you want to try it out for yourself, here's some Matlab code implementing the two solutions (note that due to Matlab's indexing rules, $w_0$ is w(1) and $w_1$ is w(2) etc.

% Create a fake dataset  x = linspace(2,8,30)'; r = 0.5 + 0.1 * x + 0.1 * randn(30,1);  % Formulas from your question  w(2) = (sum(x.*r) - mean(x)*mean(r)) / (sum(x.^2) - sum(x)); w(1) = mean(r) - w(2) * mean(x);  % Formulas from Henry's answer  v(1) = (mean(x.^2) * mean(r) - mean(x) * mean(r.*x)) / (mean(x.^2) - mean(x)^2); v(2) = (mean(r.*x) - mean(r)*mean(x)) / (mean(x.^2) - mean(x)^2);  % Plot the data  plot(x,r,'xr') hold on xlabel('House size') ylabel('Price')  % Plot your best fit line (green)  plot([2 8], [1 2;1 8] * w', 'g')  % Plot Henry's best fit line (blue)  plot([2 8], [1 2;1 8] * v', 'b') 

This should result in the following plot:

enter image description here

  • 0
    I posted the page from the book that gives the solution2012-03-14