How to prove that if you have $x^*$ such that $x^*=\text{psuedoinverse}(A) b$, and $Ay=b$, then $\Vert x^* \Vert_2 \leq \Vert y \Vert_2$
Least norm solution to $Ax = b$
-
0@Marcus. You don't specify, but should it be assumed that $A$ has more columns than rows and has full row rank? – 2012-10-28
3 Answers
Let me try a one liner solution.
$Ay=b$ $\Rightarrow$ $A(y-x^*)=0$ $\Rightarrow$ $\langle x^*,y-x^*\rangle=0$ $\Rightarrow$ $\|y\|^2=\|y-x^*\|^2+\|x\|^2\ge \|x\|^2$.
Remark. $R=A^t(AA^t)^{-1}$, $x^*=Rb$. Then we have $\langle x^*,y-x^*\rangle=\langle Rb,y-x^*\rangle=$ $\langle (AA^t)^{-1}b,A(y-x^*)\rangle=0$.
You essentially want to find the solution to he following optimization problem. $\min_{x}\Vert x \Vert_2 \text{ such that } Ax = b$ Using Lagrange multipliers, we get that $\min_{x, \lambda} \dfrac{x^Tx}2 + \lambda^T (Ax - b)$ Differentiate with respect to $x$ and $\lambda$ to get that $x^* = \underbrace{A^T(AA^T)^{-1}}_{\text{pseudoinverse}}b$
Proof: $\dfrac{d \left(\dfrac{x^Tx}2 + \lambda^T (Ax - b) \right)}{dx} = 0 \implies x^* + A^T \lambda = 0 \implies x^* = -A^T \lambda$ We also have $Ax^* = b \implies AA^T \lambda = -b \implies \lambda = - \left( AA^T\right)^{-1}b$ Hence, $x^* = \underbrace{A^T(AA^T)^{-1}}_{\text{pseudoinverse}}b$
-
0Hi, @Daryl, thank you very much for replying and explanations! The $A$ is commonly as design matrix or data matrix, generally we could preprocess this matrix (e.g. remove redundant samples) to make it be full rank. Thank you! – 2015-03-28
Let $A: \mathbb{U} \mapsto \mathbb{V}$, then in terms of SVD, we can write $A$ as $A=\sum_{n=1}^R\sigma_nv_nu_n^{\dagger},$ where $\sigma_n$ is a nonzero singular value; $u_n$ and $v_n$ are the right and left singular vector, respectively; $R$ is the rank of $A$.
Since {$u_n$} form an orthonormal basis of $\mathbb{U}$, we can expand $y$ as $y=\sum_{m=1}^M\alpha_m u_m,$ where $M\ge R$ is the dimension of $\mathbb{U}$.
So $b=Af=\sum_{n=1}^R\sigma_nv_nu_n^{\dagger}\sum_{m=1}^M\alpha_mu_m=\sum_{n=1}^R\alpha_n\sigma_nv_n$
Also, the pseudoinverse is $A^+=\sum_{m=1}^R\frac{1}{\sigma_m}u_mv_m^{\dagger}$
Then $x^*=A^+b=\sum_{m=1}^R\frac{1}{\sigma_m}u_mv_m^{\dagger}\sum_{n=1}^R\alpha_n\sigma_nv_n=\sum_{m=1}^R\alpha_mu_m$.
Finally, we can see that $\|y||_2=(\sum_{m=1}^M\alpha_m^2)^{1/2},$ $\|x^*||_2=(\sum_{m=1}^R\alpha_m^2)^{1/2}.$ Therefore, $\|x^*||_2 \le \|y||_2,$ where equality holds when $M=R$ or $\alpha_m=0$ for $m=R+1, \cdots, M$.
In other words, if we define $\|y_{null}\|_2=(\sum_{m=R+1}^M\alpha_m^2)^{1/2},$ we have $\|y\|_2^2=\|x^*||_2^2+\|y_{null}\|_2^2.$