Given $A,B\in SO(3)$, direct matrix multiplication computes $C=AB$ with 27 multiplies. The group $SO(3)$ is a $3$-dimensional manifold. This suggests that direct matrix multiplication, which thinks of elements of $SO(3)$ as 9-dimensional, is not optimal. What is the minimal number of multiplies necessary to compute $C$?
Fast multiplication of orthogonal matrices
2 Answers
Off the top of my head, if you represent elements of $SO(3)$ via lifting them to elements of $SU(2)$, a product can be computed with 8 multiplies. But they're multiplies of complex numbers which are 4 multiplies each for 32 in all.
But, we can use Strassen's algorithm to compute the matrix product in 7 multiplies, and Karatsuba to compute a complex product in 3 multiplies for 21 in all.
Of course, elements of $SU(2)$ aren't general matrices, they have a special form. If you write out the four entries, it's clear you can compute the terms with 4 complex multiplies along with some adds and conjugations, so 12 real multiplies in all if we use Karatsuba or 16 otherwise.
Using Euler-Rodrigues form, we can represent the rotation as 4 real numbers. The stated composition formula requires 16 products. But we can get away with only 10 products. First compute $a_1 a_2$, $b_1 b_2$, $c_1 c_2$, and $d_1 d_2$, then use the idea of Karatsuba to get the rest of the terms in pairs. For example, you can get $a_1b_2 + a_2b_1$ by the single product $(a_1 + b_1)(a_2 + b_2)$ and subtracting off the unwanted terms.
There are more representations listed on wikipedia, but I haven't looked into how much a product would cost in those representations.
Keep in mind that fewer products isn't always better: the adds and other things take time too, and different algorithms behave differently with respect to numerical stability.
- 
1What I would like to know is if matrix form is competitive with Euler-Rodrigues/quaternion form. Quaternion form needs to convert to matrix form to be used in opengl but quaternions make slerp and rotation composition easy. Whereas composition and slerp seem more difficult with matrix form but matrix form doesn't take an efficiency hit since it is already in a form that opengl expects. So to compare the total costs of either form it is necessary to know the true cost of composition of matrix form among other things. – 2012-06-09
- 
1If you have a particular application in mind, the details of the calculation you need to do are important! But anyways, it looks like it's only 10 multiplies to convert from ER to the usual matrix form (I haven't tried to optimize it), so in terms of counting multiplies, it's still better to use ER internally, then converting to a matrix as necessary rather than doing 27 multiplies for a matrix product. But you really need to profile these things, since counting multiplies is only part of the actual runtime cost. – 2012-06-09
- 
0More particularly, recent GPUs have dedicated hardware for doing matrix multiplications (or what amounts to dedicated hardware for it) which means that those computations are likely to be much faster than any algorithmic 'acceleration' you can provide. – 2012-12-31
I was interested in the same question, and found this thread. Clearly, quaternion or SU(2) multiplication takes no more than 12 real multiplications, since complex multiplication takes no more than 3 real multiplications
http://mathworld.wolfram.com/ComplexMultiplication.html
The question though is if there are similar tricks that can be done to reduce the number of multiplications for rotation matrices, without converting to another representation. In the 2x2 rotation matrix case, it again takes no more than 3 multiplies, by the same reasoning as in complex multiplication. It's possible, in the 3x3 case, to use the cross product to get the last column of C from the first two. This saves 3 multiplications, and uses the fact that C is orthogonal. It doesn't use the orthogonality of A or B though. Can something better be done?
