8
$\begingroup$

I'm trying to understand this algorithm by Charles Van Loan for evaluating a matrix polynomial $p(\mathbf A)=\sum\limits_{k=0}^q b_k \mathbf A^k=b_0\mathbf I+b_1\mathbf A+\cdots$ (where $\mathbf A$ is $n\times n$):

$\displaystyle s=\lfloor q\rfloor, \quad r=\lfloor q/s\rfloor$
$\displaystyle \mathbf Y=\mathbf A^s$
$\displaystyle \text{for}\quad j=1,\dots,n$
$\displaystyle\quad\quad \mathbf y_0^{(j)}=\mathbf e_j$ ($\mathbf e_j$ is the $j$th column of the identity matrix)
$\displaystyle\quad\quad \text{for}\quad k=1,\dots,s-1$
$\displaystyle\quad\quad\quad \mathbf y_k^{(j)}=\mathbf A\mathbf y_{k-1}^{(j)}$
$\displaystyle\quad\quad \mathbf f_0^{(j)}=\sum_{k=q}^{rs}b_k\mathbf y_{k-rs}^{(j)}$
$\displaystyle\quad\quad \text{for}\quad k=1,\dots,r$
$\displaystyle\quad\quad\quad \mathbf f_k^{(j)}=\mathbf Y\mathbf p_{k-1}^{(j)}+\sum_{i=0}^{s-1}b_{s(r-k)+i}\mathbf y_{i}^{(j)}$

I've tried going through it a number of times, but it seems that the quantity (vector?) $\mathbf p_{k-1}^{(j)}$ was never defined anywhere in the paper. What might it be? Also, the paper claims (or more probably it's how I (mis)understood the paper) that only three matrices need to be stored for the evaluation: $\mathbf A$, $\mathbf Y$, and the final $p(\mathbf A)$. However, I can't see how to properly organize things so that an extra array for storing the $\mathbf y_k^{(j)}$ is not needed (otherwise, four matrices are now needed instead of the three claimed in the paper).

Could anyone enlighten me on how to implement this algorithm properly? Better yet, is there any code available anywhere that implements this algorithm (I'm language-agnostic, so whatever code you post, I can likely translate into the language I'm currently using).

Thanks for any help!

  • 0
    For those who can't access IEEE, [here](http://www.cs.cornell.edu/cv/ResearchPDF/A%20Note%20on%20the%20Evaluation%20of%20Matrix%20Polynomials.pdf) is a mirror on Van Loan's home page.2011-06-30
  • 0
    Since in the paper is said $p(A) = [f_r^{(1)}|\cdots|f_r^{(n)}]$, I would guess $p_{k-1}^{(j)}$ should be $f_{k-1}^{(j)}$. Otherwise why bother to compute $f_k^{(j)}$ for $k \neq r$?2011-06-30
  • 0
    And you're probably right on the fourth matrix (but I also didn't read the paper carefully). I guess the author is very good in "selling smoke".2011-06-30
  • 0
    @Giacomo: I don't want to judge Van Loan too harshly; after all he is the co-author of a certain useful reference for *matrix computations*...2011-07-01

3 Answers 3

2

The storage required for the $y^{(j)}_k$ is less than a "full" matrix in proportion as $s \ll n$. That is, $s$ such vectors are required for each step $j$, and the storage for one step may be reused in the next step.

The Question states $s$ to be the floor of $q$, the degree of the polynomial, but this is mistaken. Per the discussion of the paper preceding Algorithm I, $s$ can be any integer strictly between 1 and $q$. While this does not guarantee $s \ll n$, it allows for it. See the discussion of applying Algorithm II to the Taylor polynomial for $e^A$, where $s=4$ and $q=16$ were chosen. [Note also the discussion of minimizing "work count" in Algorithm II by choosing $s$ around $\sqrt{q/2}$.]

That leaves still the mysterious $p^{(j)}_k$. My guess is it is a typo, that $f^{(j)}_k$ is being computed by adjusting $Yf^{(j)}_{k-1}$, after initializing $f^{(j)}_0$, but I need to do a bit of scribbling to confirm that works out.

Added: My guess above is correct, that we should read $f^{(j)}_{k-1}$ instead of $p^{(j)}_{k-1}$ at one point in Algorithm II where the latter appears.

The algorithm may be understood as combining two ideas. One is that the matrix polynomial $p(A)$ will be evaluated column by column, i.e. by applying it successively to unit vectors $e_j$, $j=1,..,n$.

The other idea is simple though a bit cumbersome to spell out. Recall that $1 \lt s \lt q$ where $q$ is the degree of $p$. Let the polynomial:

$$ p(x) = \sum^q_{i=0} b_i x^i$$

be grouped according to the largest power of $x^s$ that divides each term and rewrite $p(x)$ in an expansion involving powers of $x^s$ and "coefficients" that are polynomials of degree at most $s-1$. Taking $r = \lfloor q/s \rfloor$, and assuming any indicated coefficients $b_m$ where $m \gt q$ are zero:

$$ p(x) = \sum^r_{k=0} x^{sk} \beta_k(x)$$

where $\beta_k(x) = \sum^{s-1}_{i=0} b_{sk+i} x^i$ are the polynomial "coefficients" as above.

Assume that $Y = A^s$ has already been computed, e.g. by binary exponentiation or other scheme.

The outermost loop of Algorithm II is then iteratively applying $p(A)$ to $n$ unit vectors $v = e_j$ as explained above. For simplicity we just describe computing $p(A)v$ for an arbitrary vector $v$ of compatible dimension, a column of length $n$ where $A$ is $n \times n$. In effect we thus omit superscript $(j)$ from the remainder of the discussion.

First is a loop to initialize $y_0 = v$ and $y_i = A y_{i-1}$ for $i = 1,..,s-1$.

Second is a loop that essentially applies Horner's method to get $p(A)v$ using the expansion in powers of $A^s$ and "coefficients" $\beta_k(A)v$. Note that the latter are column vectors, and for $k = 0,..,r$ we evaluate $\beta_k(A)v$ by taking the appropriate linear combination of previously initialized $y_i$. That is:

$$ f_0 = \beta_r(A) v $$ $$ f_k = Y f_{k-1} + \beta_{r-k}(A) v $$

for $k = 1,..,r$. This shows us that the term $p^{(j)}_{k-1}$ is a misprint for $f^{(j)}_{k-1}$ in Algorithm II.

  • 0
    This is interesting... if you tried it out in some computing environment, would you mind sharing the code?2011-08-11
  • 0
    @J. M.: I'd be glad to share. I suspect for many readers code would make Van Loan's idea appear in a more natural light. At first I was thinking just a vanilla C implementation (for the sake of being self-contained), but it occurs to me an [Octave](http://www.gnu.org/software/octave/) implementation would be more concise, if perhaps not ideal for checking running times.2011-08-12
  • 0
    Yes, Octave code (which is essentially MATLAB ready) would be lovely. No rush, though. :) Thanks in advance!2011-08-13
  • 0
    Hey, just wondering: did you ever manage to come up with neat code for Van Loan's method?2011-10-30
  • 0
    (after almost five years) Did you manage to come up with a working implementation of this?2016-10-14
  • 0
    @J.M.: It had slipped my mind. I have a distant recollection of doing this coding. Perhaps it will be easy to reconstruct...2016-10-14
  • 1
    I'd say "take your time", but hopefully it doesn't take another five years... :)2016-10-15
1

Here is some octave code that implements "Algorithm II" quite faithfully. The arrays named with "_shifted" are indexed using $i+1$ in the last component wherever the paper uses $i,$ because arrays in matlab are indexed from 1 instead of 0. All the arithmetic is done mod 29 to make it easy to check against a simpler reference implementation. I haven't tried to make it particularly fast; it's really just meant to demonstrate that the algorithm works.

q=20;
s=floor(sqrt(2*q));
r=floor(q/s);
n=100;
modulo=29;
A=randi(modulo,n,n);
b_shifted=randi(modulo,q+1,1);

pA_van_loan = zeros(n,n);

# Compute A^s
As=eye(n);
for k = 1:s
  As=A*As;
  As=mod(As, modulo);
end

# Van Loan algorithm II
for j = 1:n
  y_shifted = zeros(n,s);
  y_shifted(j,1) = 1;
  for k = 1:s-1
    y_shifted(:,k+1) = A*y_shifted(:,k-1+1);
    y_shifted(:,k+1) = mod(y_shifted(:,k+1), modulo);
  end

  f = zeros(n,1);
  for m = 0:(q-s*r)
    f = f + b_shifted(s*r+m+1) * y_shifted(:,m+1);
  end
  f = mod(f, modulo);
  for k = 1:r
    f = As * f;
    for m = 0:s-1
      f = f + b_shifted(s*(r-k)+m+1) * y_shifted(:,m+1);
    end
    f = mod(f, modulo);
  end

  pA_van_loan(:,j) = f;
end

# Compute polynomial naively for reference
pA_reference = zeros(n,n);
Aj = eye(n);
for j = 0:q
  pA_reference = pA_reference + b_shifted(j+1)*Aj;
  Aj=A*Aj;
  Aj=mod(Aj,modulo);
end
pA_reference=mod(pA_reference,modulo);

assert(pA_van_loan == pA_reference);
0

Here is some python code that implements Algorithm II from Van Loan. It is not optimized for speed. Instead, I tried to follow the steps of the pseudo code in the paper as much as possible. The code above can be optimized and I showed in comments two possibilities (but I am sure there are more possibilities).

import numpy as np

def van_loan(A: np.array, b: np.array) -> np.array:
    ''' Compute sum_{j=0}^q b[q] * A**q using Van Loan's algorithm II

    A needs to be a square matrix (there is no check in this function).
    b is a vector containing q+1 coefficients.
    '''
    n = A.shape[0]
    q = len(b)-1
    s = np.floor(np.sqrt(2*q)).astype(np.int)
    r = q // s
    Y = np.linalg.matrix_power(A, s)
    p_van_loan = np.zeros((n, n))
    y = np.zeros((n, s))
    for j in range(n):
        # Initialize y as being the j-th column of the n-by-n identity matrix
        y[:, 0] = 0
        y[j, 0] = 1

        # Compute the remaining y according to y_k = A*y_{k-1}
        for k in range(1, s):
            y[:, k] = np.dot(A, y[:, k-1])

        # Compute f_0
        f = np.zeros(n)
        for l in range(0, q-s*r+1):
            f += b[l+s*r]*y[:, l]
        # Compute f_0 alternative:
        # f = np.einsum("i,ni->n", b[s*r:q+1], y[:, :q-s*r+1])

        # Compute the remaining f iteratively
        for k in range(1, r+1):
            f = np.dot(Y, f)
            for l in range(0, s):
                f += b[s*(r-k)+l]*y[:, l]
            # This for loop can be replaced by the following (faster) 
            # line of code:
            # f += np.einsum("s,ns->n", b[s*(r-k):s*(r-k+1)], y[:, :s])

        # Set the j-th column to the computed f
        p_van_loan[:, j] = f

    return p_van_loan

To demonstrate that this produces the same result as when implementing the original equation of $p(\textbf{A})$, the following code can be used:

def van_loan_reference(A, b):
    n = A.shape[0]
    Ap = np.eye(n)
    p = b[0]*Ap
    for j in range(1, len(b)):
        Ap = np.dot(Ap, A)
        p += b[j]*Ap
    return p

n = 3
q = 8
np.random.seed(0)
A = np.random.rand(n, n)
b = np.random.randn(q+1)
print(van_loan(A, b))
print(van_loan_reference(A, b))

This produces:

[[26.07842457 33.42897135 37.28842938]
 [22.01005978 31.30224965 33.21369951]
 [31.17386387 42.02787474 48.15436486]]
[[26.07842457 33.42897135 37.28842938]
 [22.01005978 31.30224965 33.21369951]
 [31.17386387 42.02787474 48.15436486]]