2
$\begingroup$

i want to use least square to find x and y that minimize the result of the following function for a series of points (xi,yi) -> (x1,y1), (x2,y2),...:

note: y = f(x)

E(x,y) = SUM  (y - ((mi*x) - (mi*xi)))^2
          i    --------------------
                    1 + |mi|^2

where "mi" is the slope of f(xi, yi)

now my math is very rusty, and i think to use least square to find x and y that minimize the function above i need to find the derivative of the function above in respect to x and y independently where the derivative of each is equal to 0.

can someone please show me how to perform derivation on the above function in respect to x and y please?

thanks!

edit: I did check a few examples on the web, such as the one given below by potato, but here is where i have difficulty:

  1. these examples always look for m and b, thing is in my case, i already have m and b, what i need is the x that will give the best curve. in short, the curve im looking for has to be a point on the function f(x) with the optimized slope.
  2. This is due to my own shortcomings in calculus, but all examples give are only numerators, but the example i have above contains a denominator. Can someone help me understand how to derive with a denominator please?

thanks!

  • 0
    This is a standard result and you can find it in most linear algebra textbooks. It is in, for example, Lay's book on linear algebra.2012-12-31
  • 0
    that’s the problem, didn’t do derivation in years and what is probably simple for most is making me go crazy; im at a complete loss. if its too easy for a complete answer, can someone at least explain the steps to take please? will accept useful unwinding answer. thanks!2012-12-31
  • 0
    It's not simple at all, don't feel discouraged! I just meant that it is a result that is in many textbooks. let me see if I can find a write-up online.2012-12-31
  • 0
    Try this: http://www.tc3.edu/instruct/sbrown/stat/leastsq.htm.2012-12-31
  • 0
    thanks! allow me to read and try to make sense of it all :)2012-12-31
  • 0
    @JayDee: If you already have $m$ and $b$, then it doesn't sound to me like you're doing least squares anymore. The thing that confuses me is what $m_i$ is. You mentioned that it's the slope of the function at a particular value, but if you're considering a single line then there is only one slope, so are all the $m_i$ the same?2012-12-31
  • 0
    _mi_ is the slope at point (xi,yi). at each summation iteration the slope changes in function of f(xi) . The goal here is to find the point of the curve that minimizes the error as per explained here in section 2: [link to paper](http://www.cs.rice.edu/~jwarren/papers/dmc.pdf)2012-12-31
  • 0
    @JayDee: Typically we don't call those "summation iterations" but rather the "terms" of the summation (as in polynomials).2012-12-31
  • 0
    thanks, my calculus is indeed far back, it was refreshing to jump back in in the last hours. getting back up to speed...2012-12-31

2 Answers 2

1

Okay, as best I can tell, $f$ is a nonlinear differentiable function of one real variable, and you want to find the $x\in\mathbb{R}$ for which $E(x,f(x))$ is minimal. This we can do.

We want to find $$\frac{dE}{dx}= \frac{d}{dx}\left[\sum_i \frac{[f(x)-m_i(x-x_i)]^2}{1+{m_i}^2}\right].$$

We can move the derivative inside the summation (provided something like uniform convergence of the summation, but I think you only have a finite number of $x_i$, right?): $$\frac{dE}{dx}= \sum_i \frac{d}{dx}\left[\frac{[f(x)-m_i(x-x_i)]^2}{1+{m_i}^2}\right].$$

For convenience, I'm going to expand the square. At the same, notice that $m_i$ is simply a real number, so the denominator simply moves outside the derivative: $$\frac{dE}{dx}= \sum_i (1+m_i)^{-1}\frac{d}{dx}\left[~f(x)^2-2m_i(x-x_i)f(x)+m_i^2(x-x_i)^2\right].$$

Then, chain rule and product rule away: $$\frac{dE}{dx}= \sum_i (1+m_i)^{-1}\left[~ 2f(x)f'(x) - 2m_if(x)-2m_i(x-x_i)f'(x) + 2m_i^2(x-x_i) \right].$$

We want this to all equal zero, so we can factor out the twos, and for fun I'll put the whole thing looking like a fraction again: $$\sum_i \frac{f(x)f'(x) - m_if(x)-m_i(x-x_i)f'(x) + m_i^2(x-x_i)}{ 1+m_i^2}=0.$$

This is about as simple as it gets without knowing the details of your $f$ and $f'$. Hopefully you have some software so that you don't have to solve analytically :/

EDIT: Knowing that you are considering these as iterations means that you probably don't have a finite collection of $x_i$; one thing which is sufficient for $\sum_i \frac{dg_i}{dx}= \frac{d}{dx}\sum_i g_i$ is that the $\frac{dg_i}{dx}$ are uniformly convergent and there is at least one real number $a$ such that $\sum_i g_i(a)$ converges.

If you don't have that, I'm not really sure what can be done.

  • 0
    my bad, i had misunderstood the problem. Indeed it is not applicable for a least square linear solving approach, but does work well with a non linear gradient descent approach. still your answer is as close as can be with what i had given you, you deserve the points. thanks!2012-12-31
1

So I am still confused. Are you saying that you know all of the $m_i$'s and you need to find $x$ and $y$ such that $$E(x,y)=\sum_i \frac{(y-(m_ix-m_ix_i))^2}{1+|m_i|^2}$$ is minimized? But then you say you want to optimize the slope? What exactly is known and what are you trying to find? Are $x$ and $y$ vectors? What are $m_i$? Are they scalars or vectors or what because you wrote $|m_i|^2$ instead of $m_i^2$.

To answer your question, the method you are talking about, taking partials and setting them equal to zero, is for linear least squares fitting. You can only do LLSF when your function is linear in the coefficients you are trying to find. In your $E(x,y)$, you have non-linearity in all of your variables. So it doesn't matter if you want to solve for $x$ or $y$ or $m_i$'s, you can't use LLSF. There are plenty of other optimization algorithms you can use depending on what you know about your $E(x,y)$. And if you want to use least squares, then you have to use non-linear least squares fitting. This is exactly why you can't find anything "containing a denominator" as you mentioned because those would all be non-linear.

  • 0
    im trying to find the point x on function f(x), that minimizes the error on the function f(x). the idea is to use multiple sample points on the curve and use Least squares to find the minimum error. see section 2: [link to research paper](http://www.cs.rice.edu/~jwarren/papers/dmc.pdf)2012-12-31