2
$\begingroup$

The new find-first-set bit "ffs" CPU instruction found in the multi media extensions (MMX) 4 apparently made possible to start doing Newton-Raphson division (according to Wikipedia).

Does someone know if tries already have been made in using Companion Matrices for the same purpose?

Explanation:

The polynomial equation $ax-b=0$ is not the most complicated one, but still very interesting as it's solution $x = b/a$ is precisely the division between number $a$ and $b$ which is still to this day embarrassingly slow even on modern desktop CPUs having latencies of 10s or 20s of clock cycles.

As any polynomial equation it is related to a matrix

If we consider: $${\bf M} = \begin{bmatrix}0&a\\1&b\end{bmatrix}$$ Which would represent $x^2-ax-b$. This is not what we want to find a zero for, but we can with continuity alter to make the $x^2$ less impactful for the roots. For example multiplying $-ax-b$ with a large constant, or what would be the same in this case, alter the 1 on the off diagonal to some $\epsilon>0$, for example we can choose $\epsilon = 2^{k}$ which would be implementable with a simple bit-shift. Here's a plot over the shape of how much off the root is as a function of the number of bits $k$ we can afford per iteration: I supposed this could also be pre-stored in a cached table if the extra bits cost too much for the CPU registries. This particular example 3/2.

enter image description here

A second approach would be to expand $(x-b/a)^2$ or $(x+b/a)(x-b/a)$ or some other where we know the roots will be closely related to the fraction we seek and then modify the matrix accordingly, multiplying with scalar largest common divisor.

Let's just conclude there's plenty of approaches, alright? Anyway to the point:

With properties so that ${\bf M}^k {\bf v}$ for almost any initial $\bf v$ quotient between first and last index will approach a root to the polynomial as $k$ grows.

Could there be any benefits of companion matrices or other matrix-vector based approaches in combination with this new ffs instruction.?

  • 1
    The matrix you mention is the companion matrix for $x^2 - bx +a$, not for a first degree polynomial.2017-02-12
  • 0
    Yes of course, great observation. How sloppy of me. My brain hopped over some steps. I should explain better.2017-02-12

1 Answers 1

0

Some preliminary experiments in Octave show that maybe there is some promise in the approach. As we can see, 6 bits precision with $\geq 88\%$ probability is achievable with rather simple approach (2):

6 bits precision would mean that if the true factor is 420 we could get $420 \pm \frac{420}{2^6} \approx 420 \pm 6.56$ an expected error of $2^{-6} \approx 1/64=1.5626\%$ of the true value.

$35\%$ chance for getting (at least) $10$ bits correct in the first 8 iterations (blue curve).

enter image description here