$M), then the problem is said to be "rank-deficient". Note that the total number of constraints $N$ is not necessarily the number of linearly independent constraints ($P$).
There are ideally four cases to consider:
- For problems where you have more constraints than unknowns ($N>M$, "least squares"), $P$ is typically equal to $M$.
- For problems with more unknowns than constraints ($M>N$, "minimum length"), $P$ is typically equal to $N$.
- For problems that are "full-rank"—that is, they have same number of unknowns as independent constraints—the value of $P$ equals $M$ and $N$ and the matrix $A$ is square.
- For other problems—often ones in which the constraints are not linearly independent (one or more rows of $A$ is a combination of the other rows)—the value of $P$ is not clear ahead of time. In this case, $P < min(M,N)$.
Truncated SVD
In practice, some of the singular values in $S$ will be very close to zero (e.g. 1.4e-15
). The question is then whether they are "close enough" to zero that they should have been zero, or whether they are small, but should be considered non-zero.
When using SVD to solve a system of equations, the largest singular values contribute to fitting "the big picture", and the smallest singular values typically fit the noise. If the ratio between the largest singular value and the smallest singular value ($\kappa(A) \equiv \frac{\lambda_{max}}{\lambda_{min}}$—the condition number) is big enough, then the smallest singular value(s) will be lost in roundoff error.
Consider the following equation (from earlier):
$x_a = \sum_{i=1}^M V_{ai} \sum_{j=1}^M S^{-1}_{ij} \sum_{k=1}^M U^T_{jk} b_k$
Since $S$ is a diagonal matrix, $S_{ij}=S_{ii} \delta_{ij}$ and $S^{-1}_{ii} = \frac{1}{\sigma_i}$. We have a problem: if any of the singular values $\sigma_i$ are zero, then we cannot divide by zero. Instead, we throw away $M-P$ singular values that are zero (or close enough to it!). We also throw away the eigenvectors associated with the zero singular values by splitting $U$, $S$ and $V$ up into their non-null and "nullspace" components:
$U = \left[ U_P | U_0 \right]$
- $U$ is a $N \times N$ matrix of $N$ eigenvectors in data space
- $U_P$ is a $N \times P$ matrix of $P$ eigenvectors in data space corresponding to the $P$ nonzero singular values.
- $U_0$ is a $N \times (N-P)$ matrix of $N-P$ eigenvectors in data space corresponding to the $N-P$ zero singular values. $U_0$ is the null space of $U$.
$V = \left[ V_P | V_0 \right]$
- $V$ is a $M \times M$ matrix of $M$ eigenvectors in model space
- $V_P$ is a $M \times P$ matrix of $P$ eigenvectors in model space corresponding to the $P$ nonzero singular values.
- $V_0$ is a $M \times (M-P)$ matrix of $(M-P$ eigenvectors in model space corresponding to the $M-P$ zero singular values. $V_0$ is the null space of $V$.
$S = \begin{bmatrix} S_P & 0 \\ 0 & S_0 \end{bmatrix}$
- $S$ is a $N \times M$ diagonal matrix of singular values shared between model and data space
- $S_P$ is a $P \times P$ diagonal matrix of nonzero singular values.
- $S_0$ is a $(N-P) \times (M-P)$ diagonal matrix containing the zero-valued singular values.
We can then simplify our problem by neglecting all the singular values that are zero. This is why we sometimes call it "truncated SVD":
$x = \begin{bmatrix} V_P & V_0 \end{bmatrix} \begin{bmatrix} S_P & 0 \\ 0 & 0 \end{bmatrix}^{-1} \begin{bmatrix} U_P^T \\ U_0 \end{bmatrix} b$
$x = V_P S_P^{-1} U_P^T b$
$x_a = \sum_{i=1}^M V_{Pai} \sum_{j=1}^M S^{-1}_{Pij} \sum_{k=1}^M U^T_{Pjk} b_k$
Whereas we cannot invert $S$ if it has any zeros, we can invert $S_P$, because we have thrown away the parts of the matrix that have zeros.
If we tried to invert $S$ with singular values that were almost zero, the inverse of these tiny singular values would be huge, and would have the effect of adding destabilizing roundoff noise to the solution.
Summary
"Truncated" SVD is a variant of "full" SVD in which all the singular values equal to zero — and their corresponding eigenvectors in $U$ and $V$ — are removed from the problem. The almost-zero singular values add noise and instability to the solution, so we need to define a tolerance to determine which singular values are "close enough" to zero to be eliminated.
Choosing this tolerance is tricky, and it is very closely related to calculating the rank of a matrix via SVD. MATLAB uses tol = max(size(A)) * eps(max(s))
as its default absolute tolerance. This is approximately equal to $max(M,N) \lambda_{max} \epsilon$ in the above notation, where $\epsilon$ is the machine precision (~1e-16 for double precision).
Once you determine a cutoff, you need only to count how many of the singular values of $S$ are larger than this cutoff. This number is $P$, the estimated rank of your matrix (given your tolerance).
Typically, a library SVD routine returns the full SVD matrices ($U$, $S$ and $V$). It's up to you to figure out $P$ by setting a cutoff and then truncate $U$, $S$ and $V$ accordingly.