I need to write some code for an application that takes in a series of 2D points whose values are integers, and determines a polynomial regression that passes through the origin. I know how to do this via a CAS, but is anyone familiar with the math behind a regression of this type?
How to do a regression with only integer values and a fixed intercept?
-
0If you look at the first paragraph in the wikipedia article I linked to, the polynomial will go exactly through all of the points in your dataset, as $(0,0)$ will be in that dataset, $f(0)\equiv 0$. Hope this helps! **N.B:** This method will only deal with no duplicate $x$ values with differing $y$ values, I guess if you have duplicate $x$ values with different $y$ values, you could take an average of the $y$ values and just use the point $(x,y_{\text{avg}})$ in your dataset. – 2012-08-09
2 Answers
As always with least-squares regression, we look to the definitions for starters. I'll do the case of fitting a straight line through the origin; the case of higher-degree polynomials will involve partial derivatives, but is otherwise straightforward.
We want to fit the model $y=mx$ to the data points $(x_k,y_k)$ for $k=1,\dots,n$. Least squares regression demands the finding the value of $m$ that minimizes the sum of squares of the residuals:
$S=\frac12\sum_{k=1}^n (m x_k-y_k)^2$
To do that, we differentiate with respect to $m$, and equate to zero:
$\frac{\mathrm dS}{\mathrm dm}=\sum_{k=1}^n (m x_k-y_k)x_k=0$
and it is rather easy to solve for $m$:
$\begin{align*} \sum_{k=1}^n (m x_k-y_k)x_k&=0\\ m \sum_{k=1}^n x_k^2-\sum_{k=1}^n x_k y_k&=0\\ \end{align*}$
and we end up with
$\color{blue}{m=\frac{\sum\limits_{k=1}^n x_k y_k}{\sum\limits_{k=1}^n x_k^2}}$
In words: multiply the abscissas and ordinates together, total them up, and divide this total by the sum of the squares of the abscissas.
If one wants to regress instead with respect to some arbitrary fixed point $(h,k)$, subtract out $h$ from the abscissas and $k$ from the ordinates, perform the procedure above, and then undo the translation accordingly.
-
0Hah. Yes, I suppose it would have to. I mentally added the $+b$ part. Great, thanks for your help. – 2012-08-10
If the response variable is a count there are special regression models for that including Poisson regression and negative binomial regression. You might want to look at the following books involving models for count data:
Regression Analysis of Count Data; A. Colin Cameron, Pravin K. Trivedi
Negative Binomial Regression; Joseph M. Hilbe
Generalized Linear Models and Extensions, Third Edition; James W. Hardin, Joseph M. Hilbe