2
$\begingroup$

I was reading a book on numerical methods and come across with this one:

Consider the least squares problem in Euclidean norm: $$\min_x \|b-Ax\|_2,$$ where we know that $A$ is ill-conditioned. Let's replace the normal equation with a better-conditioned system $$(A^TA + \gamma I)x_{\gamma} = A^Tb,$$ where $\gamma > 0$ parameter and $I$ is the appropriate size identity matrix.

The exercise is to show that $$\|x_{\gamma}\| \leq \|x\|$$

How can I show this?

My idea was that to write $x = (A^TA)^{-1}A^Tb$ and $x_{\gamma} = (A^TA + \gamma I)^{-1}A^Tb$ and tried to give an upper bound on $x_{\gamma}$ and have rewritten the problem as $$\frac{1}{\gamma}\|(N + I)^{-1}\| \leq \|N^{-1}\|$$

  • 0
    Is this for all $\gamma$ or just small $\gamma$? Also, is it the Euclidean norm or the $A$ norm?2017-02-15
  • 0
    The exercise only says that $\gamma > 0$ parameter, so I guess it can be any, but for big $\gamma$ the regularized least squares solution will be far from the original solution. The $||\cdot ||_2$ is the Euclidean norm.2017-02-15
  • 1
    Oh, you just use the spectral theorem: each eigenvalue of $(A^T A + \gamma)^{-1}$ is $\frac{1}{\lambda+\gamma}$ where $\lambda$ is a (nonnegative) eigenvalue of $A^T A$. Now note that you can simultaneously and orthogonally diagonalize $A^T A$ and $A^T A + \gamma I$ and grind out the calculation to see that $\| (A^T A + \gamma I)^{-1} y \| \leq \| (A^T A)^{-1} y \|$.2017-02-15

2 Answers 2

1

Assume that $A^T A$ is invertible. Then by the spectral theorem, for any unit vector $y$, $\| (A^T A + \gamma I)^{-1} y \|^2 = \sum_i c_i(y)^2 (\lambda_i+\gamma)^{-2}$ and $\| (A^T A)^{-1} y \|^2 = \sum_i c_i(y)^2 \lambda_i^{-2}$ where $c_i(y)=q_i^T y$ where $q_i$ are orthonormal eigenvectors of $A^T A$. The orthogonality is crucial here.

When $A^T A$ is not invertible you have a problem: if "inverse" means "Moore-Penrose pseudoinverse", then the zero eigenvalues of $A^T A$ contribute to $\| (A^T A + \gamma I)^{-1} y \|^2$ but not to $\| (A^T A)^{-1} y \|^2$. Maybe the method can be adapted to this case, but you'll need some auxiliary conditions somewhere. (For example, the result clearly fails if $A=0$ and we interpret $(A^T A)^{-1}$ as the Moore-Penrose pseudoinverse.)

1

Unfortunately, I can not provide you with a clear proof of your statement in the general case where $A^TA$ is not invertible, but might give you some intuiton and lead you in the right direction.

This Tikhonov penalization is called "ridge regression" in Statistics. From there it is well known that your solution can be seen as the solution of the following minimization problem: $$\min_{x\in R^p}||b-Ax||_2^2, \quad s.t. \sum_{i=1}^px_i^2 \leq t,$$ i.e. $(A^TA+\gamma I)^{-1}A^Tb$ is the solution to the following dual problem: $$\min_{x\in R^p}||b-Ax||_2^2 + \gamma \sum_{i=1}^px_i^2.$$

It is immediately clear that you are performing a restricted maximization problem, and hence for its estimate $x_{\gamma}$ you will obviously get $\sum_{i=1}x_{\gamma>0,i}^2\leq \sum_{i=1}x_{\gamma=0,i}^2$, where $x_{\gamma=0}$ is the solution of the unrestricted optimization problem.

Assuming $A'A = Z$ to be invertible (which follows immediately, if we assume $A$ of full column rank). You can then write: \begin{align*} x_{\gamma} & = (A^TA+\gamma I)^{-1}A^Ty\\ & = (Z+\gamma I)^{-1}Z(Z^{-1}A^Ty)\\ & = (Z+\gamma Z Z^{-1})^{-1}Z((A'A)^{-1}A^Ty\\ & = (Z(I+\gamma Z^{-1})^{-1}Z((A'A)^{-1}A^Ty\\ & = (I+\gamma Z^{-1})^{-1}x_{\gamma=0} \\ & = (I-\gamma Z^{-1}(I+\gamma Z^{-1})^{-1})x_{\gamma=0}\\ & = x_{\gamma=0} -\gamma Z^{-1}(I+\gamma Z^{-1})^{-1}x_{\gamma=0} \end{align*}

Note: assuming $Z=I$ we then have as a special case:

$$x_{\gamma} = \frac{1}{1+\gamma}x_{\gamma=0}.$$

edit: something was wrong here. the assertion then does not directly follow from the triangle inequality as it was applied here.

edit 2: Using the SVD of $A$, i.e. $A=UDV^T$ and let $\delta_i$ be the $i$-th singular value of $A$. It then follows from the above discussion that we have $$x_{\gamma} = VD^*V^Tx_{\gamma=0}= x_{\gamma=0} - VD^{**}V^T x_{\gamma=0}.$$ where $D^*$ and $D^{**}$ are diagonal matrix with entries $\frac{\delta_i^2}{\delta_i^2+\gamma}$ and $\frac{1}{\delta_i^2/\gamma +1}$ respectively. While this representation is enough to get a feeling for the effect of shrinkage towards $0$, (if $l\to \infty$, then obviously we have $VD^*V^T \to 0$ and $VD^{**}V^T \to I$ ) some more detailed calculation would still be needed to compare the norms. Ians approach seems to be a straightforward way to go.

edit 3: Using matrix norms the solution also follows immediately from the calculations above:

Note that the eigenvalues of the symmetric matrix $B:=VD^*V^T$ are $\lambda_i :=\frac{\delta_i^2}{\delta_i^2 + \gamma}$, where $\lambda_i \leq 1$ since $l\geq 0$. Moreover let $||B||$ be the spectral norm of the matrix $B$.

We then have $$||B|| = \sqrt{\lambda_{max}(B^TB)} = \max_{i}|(\lambda_i)|\leq 1.$$

Now Let $||x||_v$ be a compatibel vector norm (i.e. the euclidean norm) and remember that we found $x_\gamma = B X_{\gamma=0}$ during our calculations. We then have:

$$||x_\gamma||_v = || B X_{\gamma=0}||_v \leq ||B|| \cdot ||x_{\gamma=0}||_v \leq 1 ||x_{\gamma=0}||_v \leq ||x_{\gamma=0}||_v.$$

This proofs the assertion. It follows from the steps above that strict inequality holds for all $l>0$.

  • 0
    $A$ is full column rank, this implies that the inverse of $A^TA$ exists, right? Besides that, I don't see now why $$||x_0-\gamma Z^{-1}(I+\gamma Z^{-1})^{-1}x_0|| \geq ||x_0|| - || \gamma Z^{-1}(I+\gamma Z^{-1})^{-1}x_0 ||$$ is true. Can you explain it a little further?2017-02-15
  • 0
    there was a mistake. i dont know how to proceed from there right now. However, Ians answere is nice, short and seems to be correct.2017-02-15
  • 0
    That was the place where I got stuck also.2017-02-15
  • 0
    i have finished my proof.2017-02-15