1
$\begingroup$

Recently, I thought of the following interesting problem. Given a set of data, I noticed that as the degree of a polynomial increases, in general the $R^2$ value tends to increase too.

I will define the $R^2$ value as the following: For a polynomial $p_k(x)=a_0+a_1 x+\cdots+a_k x^k$, the $R^2$ value for a set of points $(x_i,y_i)$ is: $$R^2\equiv 1-\sum_{i=1}^n \left[y_i-(a_0+a_1\cdot x_i +\cdots + a_n\cdot {x_i}^k)\right]^2 \tag{1}$$ Where $n$ is the number of points.

Below I demonstrate the increase of $R^2$ with the following set of data I've made up:

$$\begin{array}{c|c}x&y\\\hline0&-1\\0.5&-0.5\\1.4&-0.9\\2.1&0.2\\2.5&0.7\\3.1&1.7\\4.3&2.3\\5.2&1.5\\5.6&3.5\end{array}$$ Here is an animated GIF I have created showing this:

enter image description here

I realized that the set of data must be many-to-one or one-to-one for the $R^2$ value to tend to $1$, otherwise the interpolating polynomial will not be able to pass through all the points since the polynomial is a function.

Therefore, I've conjectured the following, and would like to prove it:

Let $p_k(x)$ be a least squares fitting polynomial of degree $k$. Consider a discrete many-to-one or one-to-one relationship between $x$ and $y$ with finite values of $y$. Then $$\lim_{k\to \infty} R^2=1$$ for all sets of data satisfying the above conditions.

I figured that the $R^2$ value will never decrease as $k$ increases.

I know that given $n$ points $(x_i,y_i)$, the following yields the coefficients $a_0, a_1,\cdots,a_k$: $$\begin{bmatrix} y_1 \\ y_2 \\ \vdots \\y_k \end{bmatrix}=\begin{bmatrix} 1 & x_1 & {x_1}^2 & \cdots & {x_1}^k \\ 1 & x_2 & {x_2}^2 & \cdots & {x_2}^k \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 1 & x_n & {x_n}^2 & \cdots & {x_n}^k \end{bmatrix} \begin{bmatrix} a_0 \\ a_1 \\ \vdots \\ a_k \end{bmatrix} \tag{2}$$

Now, I was thinking that I could combine $(1)$ and $(2)$ to prove it, however I am unsure how to do so. However, I noticed that there may be a problem with this because I know that one condition for a matrix to be invertible is that the matrix must be square. Therefore, I think we may be restricting ourselves to the specific cases where $n=k$ when proving it (Which loses the generality of the proof).

I was wondering whether proving this may be related to Power Series which does this for continuous functions $\forall x \in \mathbb{R}$. If so, I think the approach to proving this may be significantly easier.

If this is not a true conjecture or if some clarification is required, please let me know in the comments. Thanks in advance.

2 Answers 2

1

A $n-1$ degree polynomial can perfectly fit through $n$ points $(x_i,y_i)$ by choosing the right coefficients. Therefore, for big enough degree of the polynomial, the sum will disappear and thus the $R^2$ will equal one.

1

From a linear algebra perspective the result is even clearer. Given a set of $n$ observations, $R^2$ is strictly increasing function of the polynomial degree using the polynomial regression, while for $k=n-1$ you will have a perfect fit with $R^2 =1$.

The first assertion is true because $\{x^0, x^1, x^2, ... \}$ is linearly independent set, hence $R^2$ will strictly increase (because of the decrease in the residuals sum of square). (This is, of course, not a proof, but this is the main idea).

The second assertion is true because when you have, WLOG, $4$ data points of $\{(y_i, x_i)\}$ and you are fitting a $3$ degree polynomial, then you are looking at the following four equations: \begin{align} y_{1} = \beta_0 + \beta_1 x_{11} + \beta_2x_{21} + \beta_3x_{31} \\ y_{2} = \beta_0 + \beta_1 x_{12} + \beta_2x_{22} + \beta_3x_{32} \\ y_{3} = \beta_0 + \beta_1 x_{13} + \beta_2x_{23} + \beta_3x_{33} \\ y_{4} = \beta_0 + \beta_1 x_{14} + \beta_2x_{24} + \beta_3x_{34} \\ \end{align} where $x_{2i}= x_{1i}^2$ and $x_{3i}= x_{1i}^3$ such that the values of the $x$'s and $y$'s are known (given) and you are solving the system w.r.t the $\beta$. Hence, you have $4$ equations with $4$ unknowns, thus you have a perfect fit as $$ \hat{y} = y \in sp\{1, x, x^2, x^3\}, $$ where $\{1, x, x^2, x^3\}$ form a basis of the affine space, thus $y$ has unique set of coefficients (unique representation). For a polynomial of a degree less then $3$, you'll have over-determined system of equation without a solution, hence you will use the least squares procedure that will give, necessarily, $R^2 <1$ as $y \notin B =sp\{1, x\}$ (for instance) because $y \in \mathbb{R}^4$ and $\dim B = 2$. And for any polynomial of a degree larger then $3$ you'll have $R^2 = 1$ and infinitely many solutions because you'll have more unknown coefficients then equations.

In your specific case you can find that you'll have a perfect fit for polynomial regression of $8$th degree with the following coeeficients $$ (-1, 10.692601, -36.230821, 45.955124, -29.216894, 10.360711, -2.071390, 0.217535, -0.009308) $$