The new find-first-set bit "ffs" CPU instruction found in the multi media extensions (MMX) 4 apparently made possible to start doing Newton-Raphson division (according to Wikipedia).
Does someone know if tries already have been made in using Companion Matrices for the same purpose?
Explanation:
The polynomial equation $ax-b=0$ is not the most complicated one, but still very interesting as it's solution $x = b/a$ is precisely the division between number $a$ and $b$ which is still to this day embarrassingly slow even on modern desktop CPUs having latencies of 10s or 20s of clock cycles.
As any polynomial equation it is related to a matrix
If we consider: $${\bf M} = \begin{bmatrix}0&a\\1&b\end{bmatrix}$$ Which would represent $x^2-ax-b$. This is not what we want to find a zero for, but we can with continuity alter to make the $x^2$ less impactful for the roots. For example multiplying $-ax-b$ with a large constant, or what would be the same in this case, alter the 1 on the off diagonal to some $\epsilon>0$, for example we can choose $\epsilon = 2^{k}$ which would be implementable with a simple bit-shift. Here's a plot over the shape of how much off the root is as a function of the number of bits $k$ we can afford per iteration: I supposed this could also be pre-stored in a cached table if the extra bits cost too much for the CPU registries. This particular example 3/2.
A second approach would be to expand $(x-b/a)^2$ or $(x+b/a)(x-b/a)$ or some other where we know the roots will be closely related to the fraction we seek and then modify the matrix accordingly, multiplying with scalar largest common divisor.
Let's just conclude there's plenty of approaches, alright? Anyway to the point:
With properties so that ${\bf M}^k {\bf v}$ for almost any initial $\bf v$ quotient between first and last index will approach a root to the polynomial as $k$ grows.
Could there be any benefits of companion matrices or other matrix-vector based approaches in combination with this new ffs instruction.?

