1
$\begingroup$

Posting link here because it is as much math as it is programming:

Replacing Cholesky factorization from IMSL MCHOL (Fortran) in C#

(Feel free to delete if inappropriate)

(Apologies for having lapsed into abject mathematical ignorance and incompetency over the last few decades).

Thank you for getting this far!

From the documentation of the function I'm trying to replicate:

Computes an upper triangular factorization of a real symmetric matrix A plus a diagonal matrix D, where D is determined sequentially during the Cholesky factorization in order to make A + D nonnegative definite.

Routine MCHOL computes a Cholesky factorization, RTR, of A + D where A is symmetric and D is a diagonal matrix with sufficiently large diagonal elements such that A + D is nonnegative definite. The routine is similar to one described by Gill, Murray, and Wright (1981, pages 108−111). Here, though, we allow A + D to be singular.

The algorithm proceeds sequentially by rows. If A + D is singular, the Cholesky factor R is taken to have some rows that are entirely zero. The i-th row of A + D is declared to be linearly dependent on the first i − 1 rows if the following two conditions are satisfied: (images that I can't really put in here, see the documentation here)

I want to make sure I really understand what is being solved for here. Cholesky factorization wants a positive definite matrix, so we're solving for the smallest modifier that does that? And after that, it's merely a matter of algorithm?

  • 0
    Right, more or less you want to have a modified Cholesky decomposition. Vanilla Cholesky requires that matrices given to it are symmetric positive definite; what modified Cholesky does is to perturb the diagonal elements of your matrix as needed such that Cholesky can proceed as usual. What the routine returns is the triangular matrix that when multiplied with its transpose yields the original matrix with perturbations.2011-10-20

1 Answers 1

3

Here is a partial answer: why is such a routine is needed?

A Cholesky factorization is $XX^t = A$. But not all A have such a factorization. Whenever matrices get you down, try the 1×1 matrices: numbers. The transpose of [x] is just [x], and so we get: $[x][x]^t = [x^2] = [a]$ only works (for real numbers x), if a ≥ 0. You can't take the square root of a negative (and get a real number), and you can't take the square root of a "negative" (definite) matrix. So what happens if we ask for the Cholesky decomposition of [−2]? Well the routine says −2 is too small, so we'll just choose d = 2, and find the Cholesky factorization of [a] + [d] = [a + d] = [−2+2] = [0]. $[0][0]^t = [0] = [-2] + [2]$

Given any symmetric matrix, it can be thought of (after finding some other special decompositions) as a diagonal matrix: in other words as a few separate numbers. For example take a = diag(1,−2,3). We can almost find a nice Cholesky decomposition: $\begin{bmatrix}1 & 0 & 0 \\ 0 & \sqrt{-2} & 0 \\ 0 & 0 & \sqrt{3} \end{bmatrix} \begin{bmatrix}1 & 0 & 0 \\ 0 & \sqrt{-2} & 0 \\ 0 & 0 & \sqrt{3} \end{bmatrix}^t = \begin{bmatrix}1 & 0 & 0 \\ 0 & -2 & 0 \\ 0 & 0 & 3 \end{bmatrix}$ except for the pesky $\sqrt{-2}$ not being a real number. Never fear, d = diag(0,2,0) is here! $\begin{bmatrix}1 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & \sqrt{3} \end{bmatrix} \begin{bmatrix}1 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & \sqrt{3} \end{bmatrix}^t = \begin{bmatrix}1 & 0 & 0 \\ 0 & -2 & 0 \\ 0 & 0 & 3 \end{bmatrix} + \begin{bmatrix}0 & 0 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 0 \end{bmatrix} $

Things are a little messier when A and D are not both diagonal (we can assume one is diagonal, but not both for this algorithm), but I think the basic idea is the same. Fix negative diagonal entries, well, negative eigenvalues.

  • 0
    I know this problem from factor-analysis, where the diagonal of the (symmetric) covariance-matrix partly represents uncorrelated variance, aka item-specific variance (including error in measurement). To solve the problem to find a best representation of the covariance by less-than-rank factors, first the itemspecific variance is removed from the diagonal. But this can only be estimated and might be too optimistic such that negative eigenvalues occur (Heywood-cases). Possibly such a routine as mchol is/was then used to find a spd-solution by *iteratively approximating* the optimal diagonal.2011-10-20