2
$\begingroup$

Let $A\in R^{m*n}$ and $b\in Range(A)$. Then how can we find the minimum norm solution to the $Ax=b$ using lagrange multiplier(basically looking for proof of pseudoinverse using lagrange multipliers). Here is my attempt: $$min_x\frac{1}{2}||x||^2\text{ , s.t }Ax=b$$ $$L(x,\lambda)=\frac{1}{2}||x||^2+\lambda^T(Ax-b)$$ $$\triangledown L(x,\lambda)=x+A^T\lambda=0$$ which gives, $x=-A^T\lambda$ and $-AA^T\lambda=b$. Now $AA^T$ might not be invertible, so how can we proceed further to solve for $x$ and $\lambda$? Any hints?

  • 1
    It doesn't matter whether or not $AA^T$ is invertible, as long as $b$ is in the range of $AA^T$.2017-02-15
  • 0
    @MichaelGrant yeah right. But since $b\in range(A)$, we have $\forall y \text{ s.t } Ay=b$, $A^T\lambda=-y$ and so $x=y$. So it looks like every $y \text{ s.t } Ay=b$ is the solution. Which is not correct. Where am I going wrong. Thanks2017-02-15
  • 0
    @user1131274 Your $\rm y$ is any solution of the linear system $\rm Ax = b$. Why are you surprised?2017-02-16
  • 0
    Well, the solution is definitely unique, as the objective is strongly convex. But the fact that $-AA^T\lambda =b$ admits multiple solutions for $\lambda$ doesn't pose a problem. After all, what if $A^T\lambda$ is unique?2017-02-16
  • 0
    @MichaelGrant But to be sure I understand this correctly, as I said in my earlier comment, event if $A^T\lambda$ is unique for a given $y$, there are lots of $y$ and each $y$ looks like a solution as $x=-A^T\lambda$. But that argument will fail because $y$ which are not in the $range(A^T)$ will have no solution for $lambda$. Only those solutions of $Ax=b$ matters which fall into $range(A^T)$ and that will be unique as $range(A^T)$ is orthogonal to $kernel(A)$. And projection of $x$ s.t $Ax=b$ on range(A^T) is unique. Is this right?2017-02-16

1 Answers 1

1

Let $A=U\Sigma V^T$ be the economy SVD of $A$; that is, $$U\in\mathbb{R}^{m\times p} \quad \Sigma\in\mathbb{R}^{p\times p} \quad V\in\mathbb{R}^{n\times p} \quad U^TU=V^TV=I_p \quad p=\mathop{\textrm{rank}}(A)=\mathop{\textrm{rank}}(\Sigma)$$ Since we know that $b\in\mathop{\textrm{Range}}{A}$, it must be the case that $b=U q$ for some vector $q\in\mathbb{R}^p$. Similarly, the optimality condition $x+A^T\lambda=0$ implies that $x\in\mathop{\textrm{Range}}(A^T)$, which means that $x=Vr$ for some vector $r\in\mathbb{R}^p$. So now we have $$\begin{aligned} &Ax=b \quad\Longrightarrow\quad U\Sigma V^T (Vr) = U q \quad\Longrightarrow\quad r = \Sigma^{-1} q\\ &\quad\Longrightarrow\quad x = Vr = V\Sigma^{-1}q = V\Sigma^{-1}U^TUq=V\Sigma^{-1}U^T b = A^\dagger b. \end{aligned}$$ So the solution is indeed identical to the one obtained from the pseudoinverse.

  • 0
    So basically lagrange equation tells us that best solution lies in $range(A^T)$ which for a fact is unique. Nice. Thanks2017-02-16