5
$\begingroup$

In my work, I have repeatedly stumbled across the matrix (with a generic matrix $X$ of dimensions $m\times n$ with $m>n$ given) $\Lambda=X(X^tX)^{-1}X^{t}$. It can be characterized by the following:

(1) If $v$ is in the span of the column vectors of $X$, then $\Lambda v=v$.

(2) If $v$ is orthogonal to the span of the column vectors of $X$, then $\Lambda v = 0$.

(we assume that $X$ has full rank).

I find this matrix neat, but for my work (in statistics) I need more intuition behind it. What does it mean in a probability context? We are deriving properties of linear regressions, where each row in $X$ is an observation.

Is this matrix known, and if so in what context (statistics would be optimal but if it is a celebrated operation in differential geometry, I'd be curious to hear as well)?

  • 1
    Oh, I actually figured it out now. $(I-\Lambda)$ takes outcomes $Y$, does a regression and spits out the estimated residuals. If anyone else is interested, I'll elaborate more.2011-06-23

2 Answers 2

11

It is also called hat matrix. The idea is that this matrix "gives the hat": transforms the dependent variable to its prediction in linear regression.

The linear regression model is the following:

$y=X\beta+\varepsilon.$

The least squares estimate of the $\beta$ is defined as

$\hat\beta=(X^TX)^{-1}X^Ty.$

The prediction of the model is then:

$\hat{y}=X\hat\beta=X(X^TX)^{-1}X^Ty$

So we get that matrix $X(X^TX)^{-1}X^T$ transforms $y$ to $\hat{y}$, hence the hat matrix.

  • 0
    Be$a$utiful. This was exactly what I was looking for. I just (well, 35 minutes ago) managed to deduce the relevant properties $b$ut knowing that it is called the hat matrix might be very use$f$ul! Thanks!2011-06-23
3

This should be a comment, but I can't leave comments yet. As pointed out by Rahul Narain, this is the orthogonal projection onto the column space of $X$