1
$\begingroup$

I have been given a least squares problem to solve, with the solution but I do not understand the steps provided and I am seeking clarity on why certain things in the solution have been done.

I am given a nonlinear model along with measurements for $T_j, K_j$ and $\Delta K_j$.

$K_j = C\exp(-E/T_j)+R_j$,

where $E$ and $C$ need to be estimated and $R_j$ is independent and normally distributed.

First I am asked to show the model can be transformed to an equation of a different form that is linear:

$\ln(K_j)=\ln(C)-\frac{E}{T_j}+r_j$,

where $r_j=\frac{R_j}{C\exp(-E/T_j)}$, which I can do. But after this I can't follow the solution very well.

I need to show that the standard deviation of $r_j=\delta_j=\frac{\Delta K_j}{K_j}$. The solution states because $r_j$ is independent and normally distributed we can write:

$r_j=\frac{R_j}{K_j-R_j}\approx\frac{R_j}{K_j} \rightarrow \delta_j:=\frac{K_j}{K_j}$.

So I don't really understand why this has been done, obviously I have a gap in my knowledge here and I am looking to fill it!

The problem then continues and states that in order to use the least squares method $r_j$ needs to be scaled in such a way that all $r_j$ have the same standard deviation and the following relation:

$\sum^m_{j=1}\left(\frac{r_j}{\delta_j}\right)^2=\sum^m_{j=1}\left(\frac{-E/T_j+\ln(C)-\ln(K_j)}{\delta_j}\right)^2=\min$.

The solution continues in code, but that is ok. What I need to understand is how the problem is formulated and rearranged in this way. As I understood it we apply the least squares method when we can't solve a system but want to find the closest solution possible to solving a system.

So specific questions I have regarding this choice of rearrangement would be:

Why do we make the choice to set $r_j$ on one side and not one of the other variables such as $T_j$ or $K_j$?

Why is it important that the standard deviation is scaled to 1 for everything (I assume that is why we have divided by it)?

When I look at the problem I get a system where the form should be $||Ax-b||_2=min$. Why do I make the choice that A will consist of the $\frac{1}{T_j}$ and b to consist of $\ln(K_j)$?

The main problem I think is I am used to solving this with artificial problems and this problem looks more "real" and has actual data but I am having trouble following the method which means I haven't understood it so well. I hope I have been clear and provided enough information, but please let me know if anything is confusing or if I could provide anything else to help you help me. Cheers.

  • 1
    I get $$ K_j = C e^{-E/T_j} + R_j \iff \\ \ln(K_j-R_j) = \ln(C) -E/T_j $$ Are you sure about the $R_j$ outside the argument of the exponential function?2017-02-04
  • 0
    Yes, they used a Taylor approximation to get it in that form. Sorry I didn't put that in the initial statement. But the equation I have given is from the provided solution of the problem and I was also able to get to it on paper. I could have made a mistake, but I believe it is correct.2017-02-04

3 Answers 3

1

The least squares problem is good if you have a known linear equation describing the relation between known, independent variables and observed measurements of the dependent variables. (See e.g. the wikipedia definition of dependent and independent.) I'm starting with this, since you wonder why you use these variables. In order to make your non-linear system linear, you look at the logarithmized version. Despite the fact that your $R_j$ were independent i.i.d. Gaussians, your $r_j$ will not be.

What the least squares regression does is to minimize the square of the error, the error here being your $r_j$. It is common to assume that our underlying model is true in some sense, and that the only reason for any $r_j$ deviating from zero is measurement error. Therefore, we want to scale the problem so that all measurement errors are seen as "equally bad". Ideally, we would want them to be identically distributed Gaussians, because then the least squares solution is a maximum likelihood estimator for the parameters. In this case, even when you rescale your $r_j$, they are not normally distributed anymore, but at least it brings you closer.

You can also see the effects if you rescale some row by some other factor. The deviation, in the unscaled version, will decrease in that point, at the cost of higher residuals in the other points. Again, the rescaling here was choosen to approximate a minimization of the errors in the original $R_j$.

If it's not already clear, $r_j$ is put on a separate side since it is the slack parameter representing the fitting error, the square of which you are trying to minimize. Even in your original statement, $R_j$ is the only value you will never truly quantify, you would never try to first specify the values of you $R_j$s (rather than the distribution) and then work your way back to the other parameters.

  • 0
    This doesn't really seem like an answer...2017-02-04
1

When I look at the problem I get a system where the form should be $\lVert Ax-b\rVert_2=\min$. Why do I make the choice that $A$ will consist of the $\frac{1}{T_j}$ and $b$ to consist of $\ln(K_j)$?

Assuming the equations are $$ \ln(K_j)=\ln(C)-\frac{E}{T_j}+r_j $$ with unknowns $C$ and $E$, I would rewrite as $$ (1, -1/T_j) (\ln(C), E)^\top = \ln(K_j) - r_j \iff \\ A x = b \\ $$ with $$ A = \begin{pmatrix} 1 & -1/T_1 \\ \vdots & \vdots \\ 1 & -1/T_n \end{pmatrix} \quad x = \begin{pmatrix} \ln(C) \\ E \end{pmatrix} \quad b = \begin{pmatrix} \ln(K_1) - r_1 \\ \vdots \\ \ln(K_n) - r_n \end{pmatrix} $$ to get a linear system, and then solve for $$ A^\top A x = A^\top b \iff \\ x = (A^\top A)^{-1} A^\top b $$ to get a least square approximation $x$.

  • 0
    Sorry about the delay in response. But why do you pick A to be a matrix containing your T variables and not C? Is it because T are measured values that I know and I need to calculate what C and E are?2017-02-08
  • 1
    The vector $x$ contains the unknown variables, in this case $E$ and $\ln(C)$ (where we extract $C$ by exponentiation). $A$ and $b$ are assumed to be known before the optimization procedure. The procedure's task is to come up with an $x$, such that $\lVert Ax - b\rVert_2$ is minimal ("least squares").2017-02-08
1

When you made the model linear, you minimze $$SSQ=\sum_{i=1}^n \left(\log(y_i^{(calc)})-\log(y_i^{(exp)})\right)^2=\sum_{i=1}^n \Delta_i^2$$ So, consider $$\Delta_i=\log(y_i^{(calc)})-\log(y_i^{(exp)})=\log\left(\frac{y_i^{(calc)}}{y_i^{(exp)}}\right)=\log\left(1+\frac{y_i^{(calc)}-y_i^{(exp)}}{y_i^{(exp)}}\right)$$ If $y_i^{(calc)}-y_i^{(exp)}$ is sufficiently small, remember that, close to $0$, $\log(1+x)\approx x$. So, if the errors are not too large, then $$\Delta_i\approx \frac{y_i^{(calc)}-y_i^{(exp)}}{y_i^{(exp)}}$$ which means that the first step is equivalent to the minimization of the sum of the squares of relative errors.

I hope this will help you for the first part.