1
$\begingroup$

I just discovered today that $\mathbf{1}=(1,\ldots,1)$ is an eigenvector with eigenvalue $1$ of $X(X^TX)^{-1}X^T$ for certain $X \in \mathbb{R}^{m\times n}$ where $n \leq m$ and where the first column of $X$ is $\mathbf{1}$. Why is this?

The motivation for this is from a statement that says that the residual of the fitted values of a linear least squares estimator sum to 0. In other words, given some standard linear model $y = X\beta + \epsilon$ where $\epsilon$ is some random error, the least squares estimate for $\beta$ would be $\hat \beta = (X^TX)^{-1}X^Ty$ and the fitted value $\hat y = X\hat \beta$ or $X(X^TX)^{-1}X^Ty$. Also important to note is that $X$ is a design matrix and thus has an intercept column where all elements are $1$. Then $\mathbf{1}^T(y - \hat y) = 0$.

Through matlab, I inferred that it was because $X(X^TX)^{-1}X$ has eigenvector $\mathbf{1}$. I couldn't prove why this is the case.

  • 0
    that matrix is the projection on the row (?) space of $X$, so if you have a constant in your regression ...2012-08-28
  • 0
    @mike: you mean the orthogonal projection on the column space. So if $\bf 1$ is a column ...2012-08-28
  • 0
    Are you sure $m \le n$? If so, then $m$ must be equal to $n$, and $\mathbf{X}$ must be a full rank square matrix in order to ensure $(\mathbf{X}^T\mathbf{X})^{-1}$ exists.2012-08-28
  • 0
    @chaohuang Oops. Fixed it.2012-08-29

3 Answers 3

2

The first column of $X$ is $X e$ where $e = (1,0,\ldots,0)^T$. If $v = X w$ for some vector $w$, $$X (X^T X)^{-1} X^T v = X (X^T X)^{-1} (X^T X) w = X w = v$$

3

$X(X^TX)^{-1}X^Ty$ is the projection of $y$ onto the column space of $X$. If $y = [1\ldots1]^T$, then $y$ lies in the column space of $X$ because the first column of $X$ is $[1\ldots1]^T$. Hence, the projection will be $y$ again. Therefore, $X(X^TX)^{-1}X^Ty = y$, i.e. $y$ is an eigenvector of this projection matrix with eigenvalue $1$.

3

In typical problems of this kind, $X$ has many more rows that columns. You wrote $X \in \mathbb{R}^{m\times n}$ and $m\le n$. That doesn't make sense.

One has $X\in\mathbb{R}^{n\times m}$ and $m\le n$ (you see, I wrote $n\times m$ rather than $m\times n$). Then $X^T X$ is an $m\times m$ matrix, and $m\le n$, so it's a small matrix, maybe as small as $2\times 2$ in some typical applications. If the columns of $X$ are linearly independent then $X^T X$ is invertible, i.e. $(X^T X)^{-1}$ exists. Now let $H=X(X^T X)^{-1}X^T$. The rank of the big $n\times n$ matrix $H$ is the small number $m$.

Now:

  • If $v$ is in the column space of $X$ then $Hv=v$.
  • If $v$ is orthogonal to the column space of $X$ then $Hv=0$.

So there you have two eigenspaces with eigenvalues $0$ and $1$, and since they're orthogonal complements of each other, they span the whole space.

Here's a proof of the two bulleted points. The second one is the easier one: Suppose $v$ is orthogonal to the column space of $X$. Then clearly $X^T v=0$. Therefore $Hv = X(X^T X)^{-1}X^T v$ $= X(X^T X)^{-1}0 = 0$. Now suppose $v$ is in the column space of $X$. Then $v=Xu$ for some column $u$ of (small) length $m$. Then $$ Hv = HXu = \big(X(X^T X)^{-1} X^T\big) Xu = X(X^T X)^{-1} (X^T X) u = Xu = v. $$

So if the column $\mathbf{1}$ of $1$s is in the column space of $X$, then $H\mathbf{1}=\mathbf{1}$.