5
$\begingroup$

There's two parts to this problem the first is implementation related the second is theoretical.

Part 1: I've been given $A$ & $B$ and need to solve for $x$. To do this I've been using an SVD. The problem is when $A$ has a large condition number the SVD loses enough precision that returned matrix becomes useless (filled with nan values).

The condition number is quotient of the absolute value of largest and smallest eigenvalues. Part of the reason the condition number is so large is the min value is on the order of $10^{-19}$. I used the Jacobi method to calculate the eigenvalues of $A$.

Both the SVD & Jacobi implementations come from Numerical Recipes in C 3rd ed.

I've heard that you can threshold the SVD to ignore such small values but can't find reference to it in the NR and LAPACK implementations or on papers on SVD implementation. Am I missing something does this not exist?

Part 2: What other methods are used to solve $Ax = B$?

Thanks, Jon

  • 0
    Also: often, if you end up dealing with an ill-conditioned matrix, it often turns out that you're solving the wrong problem to begin with. Where did your matrices come from?2012-04-22

2 Answers 2

1

The most common approach is to use a matrix preconditioner.

Deciding which to use is a matter of understanding its impact on your problem, so you'll need to consult a numerical analysis text to decide what it right for you. Often simple Jacobi preconditioners will work based on the matrices in your system.

This is a rather open ended question, so it's probably best if you check with some preconditioning references and then post a new question if you encounter more specific issues.

0

Numerical Recipes in C (2nd printing) actually discusses the truncated SVD scheme in some detail. If you want, skip to the last section ("Summary").

Let's assume the following about your problem:

You're trying to solve $Ax = b$:

  • $M$ is the number of model parameters you want to solve for
  • $N$ is the number of equations you're trying to simultaneously solve
  • $A$ is a $N \times M$ matrix of coefficients
  • $x$ is the $M \times 1$ solution or model-parameter vector (it has other names)
  • $b$ is the $N \times 1$ data vector (it has other names)

The "full" SVD gives you $A = U S V^T$, where:

  • $U$ is a $N \times N$ matrix of eigenvectors in data space
  • $S$ is a $N \times M$ diagonal matrix of singular values shared between model and data space
  • $V$ is a $N \times N$ matrix of eigenvectors in model space

You can solve $Ax = b$ via this equation:

$x = V S^{-1} U^T b$

$x_a = \sum_{i=1}^M V_{ai} \sum_{j=1}^M S^{-1}_{ij} \sum_{k=1}^M U^T_{jk} b_k$

In theory, $P$ of the singular values $\sigma_i$ in $S$ will be nonzero.

$P$ is the rank of matrix $A$. It is the number of linearly independent equations in matrix $A$. In a "full-rank" matrix, $P=M$. If the problem has more unknowns than linearly independent constraints ($M), then the problem is said to be "rank-deficient". Note that the total number of constraints $N$ is not necessarily the number of linearly independent constraints ($P$).

There are ideally four cases to consider:

  • For problems where you have more constraints than unknowns ($N>M$, "least squares"), $P$ is typically equal to $M$.
  • For problems with more unknowns than constraints ($M>N$, "minimum length"), $P$ is typically equal to $N$.
  • For problems that are "full-rank"—that is, they have same number of unknowns as independent constraints—the value of $P$ equals $M$ and $N$ and the matrix $A$ is square.
  • For other problems—often ones in which the constraints are not linearly independent (one or more rows of $A$ is a combination of the other rows)—the value of $P$ is not clear ahead of time. In this case, $P < min(M,N)$.

Truncated SVD

In practice, some of the singular values in $S$ will be very close to zero (e.g. 1.4e-15). The question is then whether they are "close enough" to zero that they should have been zero, or whether they are small, but should be considered non-zero.

When using SVD to solve a system of equations, the largest singular values contribute to fitting "the big picture", and the smallest singular values typically fit the noise. If the ratio between the largest singular value and the smallest singular value ($\kappa(A) \equiv \frac{\lambda_{max}}{\lambda_{min}}$—the condition number) is big enough, then the smallest singular value(s) will be lost in roundoff error.

Consider the following equation (from earlier):

$x_a = \sum_{i=1}^M V_{ai} \sum_{j=1}^M S^{-1}_{ij} \sum_{k=1}^M U^T_{jk} b_k$

Since $S$ is a diagonal matrix, $S_{ij}=S_{ii} \delta_{ij}$ and $S^{-1}_{ii} = \frac{1}{\sigma_i}$. We have a problem: if any of the singular values $\sigma_i$ are zero, then we cannot divide by zero. Instead, we throw away $M-P$ singular values that are zero (or close enough to it!). We also throw away the eigenvectors associated with the zero singular values by splitting $U$, $S$ and $V$ up into their non-null and "nullspace" components:

$U = \left[ U_P | U_0 \right]$

  • $U$ is a $N \times N$ matrix of $N$ eigenvectors in data space
  • $U_P$ is a $N \times P$ matrix of $P$ eigenvectors in data space corresponding to the $P$ nonzero singular values.
  • $U_0$ is a $N \times (N-P)$ matrix of $N-P$ eigenvectors in data space corresponding to the $N-P$ zero singular values. $U_0$ is the null space of $U$.

$V = \left[ V_P | V_0 \right]$

  • $V$ is a $M \times M$ matrix of $M$ eigenvectors in model space
  • $V_P$ is a $M \times P$ matrix of $P$ eigenvectors in model space corresponding to the $P$ nonzero singular values.
  • $V_0$ is a $M \times (M-P)$ matrix of $(M-P$ eigenvectors in model space corresponding to the $M-P$ zero singular values. $V_0$ is the null space of $V$.

$S = \begin{bmatrix} S_P & 0 \\ 0 & S_0 \end{bmatrix}$

  • $S$ is a $N \times M$ diagonal matrix of singular values shared between model and data space
  • $S_P$ is a $P \times P$ diagonal matrix of nonzero singular values.
  • $S_0$ is a $(N-P) \times (M-P)$ diagonal matrix containing the zero-valued singular values.

We can then simplify our problem by neglecting all the singular values that are zero. This is why we sometimes call it "truncated SVD":

$x = \begin{bmatrix} V_P & V_0 \end{bmatrix} \begin{bmatrix} S_P & 0 \\ 0 & 0 \end{bmatrix}^{-1} \begin{bmatrix} U_P^T \\ U_0 \end{bmatrix} b$

$x = V_P S_P^{-1} U_P^T b$

$x_a = \sum_{i=1}^M V_{Pai} \sum_{j=1}^M S^{-1}_{Pij} \sum_{k=1}^M U^T_{Pjk} b_k$

Whereas we cannot invert $S$ if it has any zeros, we can invert $S_P$, because we have thrown away the parts of the matrix that have zeros.

If we tried to invert $S$ with singular values that were almost zero, the inverse of these tiny singular values would be huge, and would have the effect of adding destabilizing roundoff noise to the solution.

Summary

"Truncated" SVD is a variant of "full" SVD in which all the singular values equal to zero — and their corresponding eigenvectors in $U$ and $V$ — are removed from the problem. The almost-zero singular values add noise and instability to the solution, so we need to define a tolerance to determine which singular values are "close enough" to zero to be eliminated.

Choosing this tolerance is tricky, and it is very closely related to calculating the rank of a matrix via SVD. MATLAB uses tol = max(size(A)) * eps(max(s)) as its default absolute tolerance. This is approximately equal to $max(M,N) \lambda_{max} \epsilon$ in the above notation, where $\epsilon$ is the machine precision (~1e-16 for double precision).

Once you determine a cutoff, you need only to count how many of the singular values of $S$ are larger than this cutoff. This number is $P$, the estimated rank of your matrix (given your tolerance).

Typically, a library SVD routine returns the full SVD matrices ($U$, $S$ and $V$). It's up to you to figure out $P$ by setting a cutoff and then truncate $U$, $S$ and $V$ accordingly.