I need to interpolate a linear trend surface through a number of points but with the condition that the surface has to pass exactly through one of them. Can somebody give me any advice?
Linear trend has to pass through a point
-
0Thank you everybody for your kind and very helpful replies. I would like to add that I am sorry because my question was not so clear. What I am looking for is Equality Constrained Least Squares. I have a set of n points $(x_i,y_i,z_i)$ and i want to find the plane $z=ax+by+cz+d$ best fitting all the points and passing through a given point $(x_0,y_0,z_0)$. – 2012-12-28
2 Answers
Let me begin with the usual cautionary note about least-squares fitting, that it is unduly sensitive to data points that have larger errors.
A special case (to which the general case is easily reduced) of forcing a fitted plane to pass through a point is to make it pass through the origin. The phrase "regression through the origin" is often used to describe this case. An erudite but readable account is here.
The ordinary least-squares fit would use a model that includes a constant term:
$ f(x,y) = ax + by + c $
and our objective would be to minimize for given observations $(x_i,y_i,z_i)$ the sum of squared errors $\sum_i (z_i - f(x_i,y_i))^2$.
Regression through the origin means omitting the parameter $c$ (setting the constant term to zero). Naturally with one less parameter we typically would have a larger least-squares error.
If we know which data point we want the trend plane to pass through, say $(x_0,y_0,z_0)$, then we can transform the data by subtracting that point from every other observation:
$ (u_i,v_i,w_i) = (x_i,y_i,z_i) - (x_0,y_0,z_0) $
and performing "regression through the origin" (RTO) on this transformed data set.
Because the suggestion is to do ordinary least-squares (OLS) fitting first in order to pick a "base point", we will derive the normal equations for that case and note that for RTO these reduce (simplify) by essentially omitting the constant term. That is, the OLS objective function:
$ \sum (z_i - ax_i - by_i - c)^2 $
gives by partial differentiation wrt the model parameters $a,b,c$ a system of three linear equations in those unknowns:
$ \begin{pmatrix} \sum x_i^2 & \sum x_i y_i & \sum x_i \\ \sum x_i y_i & \sum y_i^2 & \sum y_i \\ \sum x_i & \sum y_i & \sum 1 \end{pmatrix} \begin{pmatrix} a \\ b \\ c \end{pmatrix} = \begin{pmatrix} \sum x_i z_i \\ \sum y_i z_i \\ \sum z_i \end{pmatrix} $
Once the OLS parameters are found, we check to see which observation $(x_i,y_i,z_i)$ has the least error $| z_i - f(x_i,y_i) |$. Using that as our base point, the transformed data described above by subtracting the base point from the other observations would give an RTO problem for two equations in two unknowns:
$ \begin{pmatrix} \sum u_i^2 & \sum u_i v_i \\ \sum u_i v_i & \sum v_i^2 \end{pmatrix} \begin{pmatrix} a \\ b \end{pmatrix} = \begin{pmatrix} \sum u_i w_i \\ \sum v_i w_i \end{pmatrix} $
Of course this RTO solution would then get translated back to the original coordinates by adding the base point to its formula.
If there is a small data set you'd like to see me use, post a Comment. Otherwise I can make one up for the sake of illustration.
Example: Suppose we wish to fit these data points:
(59, 64, 10.91) (75, 52, 10.38) (86, 73, 10.6) (88, 53, 10.56)
with a plane passing through (69, 76, 10.00)
. Translating the latter to the origin transforms the other four points into:
(-10, -12, 0.91) ( 6, -24, 0.38) (17, -3, 0.6) (19, -23, 0.56)
Using an ad hoc Excel spreadsheet I calculated the appropriate sums to compute RTO for these transformed data:
$ \begin{pmatrix} 786 & -512 \\ -512 & 1258 \end{pmatrix} \begin{pmatrix} a \\ b \end{pmatrix} = \begin{pmatrix} 14.02 \\ -34.72 \end{pmatrix} $
Solving this system gives (to display accuracy) $a = -0.00019, b = -0.02768$. Putting things back in terms of the original coordinates means this is the best (least squares) fit with a plane through (69, 76, 10.00)
:
$ z_{fit} = 10.00 - 0.00019(x - 69) - 0.02768(y - 76) $
The fit doesn't seem very good, although adding 10 to each z-coordinate tends to obscure that. The fitting errors in z at the four data points are:
$ z_i - z_{fit} = 0.575951, -0.28311, 0.520231, -0.07294 $
and the sum of squared errors is $0.687829$.
-
0i understand that my english caused some misunderstanding. Sorry and thank you for your very helpful reply. – 2012-12-28
For a given point as the one the line passes through, you have a single parameter of solutions, the slope. If the given point is $(x_0,y_0)$, the lines passing through it are $y=y_0+m(x-x_0)$ If you want the best fit in a least squares sense, let the data be $(x_i,y_i)$ with $i$ ranging from $1$ to $n$. We want to minimize $\sum_{i=1}^n (y_i-y_0-m(x_i-x_0))^2$ Taking the derivative and setting to zero gives $0=2\sum_{i=1}^n (y_i-y_0-m(x_i-x_0))(x_i-x_0)$ so $m=\frac{\sum_{i=1}^n (y_i-y_0)(x_i-x_0)}{\sum_{i=1}^n (x_i-x_0)^2}$
I would just do this calculation for each point, and take the answer with the lowest sum squared error.
-
0@fparaggio: The same approach works. You can define the slope of $z$ vs $x$ at constant $y$ and the slope of $y$ vs $x$ at constant $z$. These two parameters will give you two equations in two unknowns that you can solve. – 2012-12-28