3
$\begingroup$

I’m quite stuck on the question below. It keeps coming up in my exam and I don’t know how to do it! Please help me! Thanks!

Show that the matrix $P = A (A^tA)^{-1} A^t$ represents an orthogonal projection onto $\mathcal{R}(A)$. Hence or otherwise explain why $x^\star = (A^t A)^{-1} A^t b$ represents the least squares solution of the matrix equation $Ax=b$.

  • 0
    Yep I had a look on wiki. The problem is that I don’t really understand the concept behind it. It just looks like a bunch of notation to me. I’m not really sure anymore what an orthogonal projection means - isn’t it idempotent and symmetric? Do you need to show the range's and nullspaces are equal etc. and for the X* bit do you just work out A$x$* (using x* given) and then show its =b? Maybe it’s the way its worded that’s confusing me... What exactly does a projection even do? All I know is P(v)=v!2011-05-20

1 Answers 1

4

Well, okay, assuming $A$ is overdetermined ($m \times n$ where m > n). The column of vectors of $A$ spans $\mathcal R(A)$, assuming $A$ has full rank the column vectors form a basis.

The matrix $A^T A$ will be symmetric ($(A^TA)^T = A^T A$), and if $A$ has full rank, invertible. If $A$ does not have full rank, it will not be invertible.

Now, naming the column vectors in $A$ $a_1, a_2, \dots, a_n$ and $x = (x_1, x_2 \dots, x_n)^T$ we can write $Ax$ as: $Ax = x_1 a_1 + x_2 a_2 + \dots + x_n a_n = \sum_{i=1}^n a_i x_i$ and we want this to be equal to $b$. Assuming there is no such solution, we want to minimize $\| b - Ax \| = \left\| b - \sum_{i=1}^n a_i x_i \right\|$ To minimize this, we want to find the projection of $b$ onto $\mathcal R(A)$.This projection can be expressed as $\sum_{i=1}^n a_i x_i$, since the $a_i$ span $\mathcal R(A)$. Why does this minimze the above equation? Well, simply because this is the closest we can get - we can't express elements not in $\mathcal R(A)$ with elements in $\mathcal R(A)$.

Now, assume $x_1, x_2, \dots, x_n$ are the coefficients to $a_1, a_2, \dots a_n$ such that $\sum a_i x_i$ is the orthogonal projection of $b$ onto $\mathcal R(A)$. Then $b$ can written as $b = Ax + y$ for some $y$ which is orthogonal to $\mathcal R(A)$. Thus $y = b - Ax$. Since $y$ is orthogonal to $\mathcal R(A)$, it is orthogonal to every $a_i$: $ \left\langle a_i , \underbrace{b - Ax}_{=y} \right\rangle = 0 \Longleftrightarrow a_i^T(b-Ax)=0 ~~~~ \forall i = 1, \dots, n $ This can be written in matrix form as $A^T(b-Ax)=0$ taking the equations above for all $i$ at once, which can be rewritten as $A^Tb = A^TAx$, or if $A^TA$ is invertible, $x = (A^TA)^{-1}A^Tb$.

So, basically, to derive $x = (A^TA)^{-1}A^T b$, consider the condition that $b-Ax$ should be orthogonal to the column vectors of $a_i$ one at a time, then stack these equations "on top of each other" in a matrix equation.

  • 0
    Okay - understood! :)2011-05-25