16
$\begingroup$

I am working an algorithm which is supposed to align a pair of images. The motion model, which describes the pose $p$ of an image (with respect to the second) in 3D space, is purely rotational.

Theory to my question:

According to the Rodrigues' formula for rotation matrices, the matrix exponential $R(p)=e^{\hat{p}}$ for a vector $p=(p_x,p_y,p_z)^T\in\mathbb{R}^3$ is given by $ \begin{align} R(p)&=e^{\hat{p}}=I + \frac{\hat{p}}{||p||} \sin(||p||) + \frac{\hat{p}^2}{||p||^2} (1-\cos{\theta}) \end{align} $ where $||p||=\sqrt{p_x^2+p_y^2+p_z^2}$ and $ \begin{align} \hat{p} &= \begin{bmatrix} 0 & -p_z & p_y \\ p_z & 0 & -p_x \\ -p_y & p_x & 0 \end{bmatrix} \end{align}. $

Summing up the elements of $R(p)$ yields $ \begin{align} R(p) = \begin{bmatrix} 1-(p_y^2+p_z^2)s & -p_z r+p_xp_y s & p_y r+p_xp_z s \\ p_z r+p_xp_y s & 1-(p_x^2+p_z^2)s & -p_x r+p_yp_z s \\ -p_z r+p_xp_z s & p_x r+p_yp_z s & 1-(p_x^2+p_y^2)s \end{bmatrix} \end{align} $ where $r=\frac{\sin(||p||)}{||p||}$ and $s=\frac{(1-\cos(||p||))}{||p||^2}$ were introduced for clarity.

Now, let $R^s(p)\in\mathbb{R}^{9\times 1}$ be the stacked version of $R(p)$, i.e. all columns are stacked into one column and let $R^s_i(p)$ denote the $i$-th element of $R^s(p)$ where $1\leq i \leq 9$.

The Jacobian matrix $J_{R}\in\mathbb{R}^{9\times 3}$ with respect to vector $p$ is then given by $ \begin{align} J_{R}&=\frac{\partial R^s}{\partial p} \\\\ &= \begin{bmatrix} \frac{\partial R^s_1}{\partial p_x} & \frac{\partial R^s_1}{\partial p_y} & \frac{\partial R^s_1}{\partial p_z} \\\\ \vdots & \vdots & \vdots \\\\ \frac{\partial R^s_9}{\partial p_x} & \frac{\partial R^s_9}{\partial p_y} & \frac{\partial R^s_9}{\partial p_z} \end{bmatrix} \end{align} $

The first element of $J_R$ can then be expressed as $ \begin{align} \frac{\partial R^s_1}{\partial p_x} &=\frac{\partial}{\partial p_x} \Bigl(1-(p_y^2+p_z^2)\frac{(1-\cos(||p||))}{||p||^2}\Bigr) \\\\ &=-\Bigl[(p_y^2+p_z^2)\frac{\sin(||p||) p_x ||p|| - (1-\cos(||p||) 2 p_x)}{||p||^4}\Bigr] \end{align} $ the second one as $ \begin{align} \frac{\partial R^s_1}{\partial p_y} &=\frac{\partial}{\partial p_y} \Bigl(1-(p_y^2+p_z^2)\frac{(1-\cos(||p||))}{||p||^2}\Bigr) \\\\ &=-\Bigl[2p_y\frac{1-\cos(||p||)}{||p||^2}+(p_y^2p_z^2)\frac{\sin(||p||) p_y ||p|| - (1-\cos(||p||) 2 p_y)}{||p||^4}\Bigr] \end{align} $ and the third element of the first rowas $ \begin{align} \frac{\partial R^s_1}{\partial p_z} &=\frac{\partial}{\partial p_z} \Bigl(1-(p_y^2+p_z^2)\frac{(1-\cos(||p||))}{||p||^2}\Bigr) \\\\ &=-\Bigl[2p_z\frac{1-\cos(||p||)}{||p||^2}+(p_y^2p_z^2)\frac{\sin(||p||) p_z ||p|| - (1-\cos(||p||) 2 p_z)}{||p||^4}\Bigr] \end{align} $

Already a big thanks for reading this far ;-)

My question: Is the derivation of these first 3 elements of $J_R$ mathematically correct? If I did something wrong, can you spot the error(s) and possibly help me correct it(/them)?

Once I have got everything straight, I will post the complete Jacobian matrix as the answer to this question. So anyone confronted with the same problem can double-check his own results ...

The reason I need this Jacobian:

The pose $p$ of an image in the application mentioned above can be described by an unit rotation axis and an angle (in radian). Alternatively, the unit axis can be scaled according to the rotation angle, which is what I am using. To do the exponential mapping from $\mathbb{R}^3$ to SO(3) I use the Rodrigues' formula (which is the standard method, I think).

The algorithm "slides" down the gradient of an error function, which is governed by the pose $p$, i.e. E(p)=\sum_i(I_1(x'(x_i;p))-I_0(x_i))^2 where x'(\cdot) is the function that maps pixels from one image space to the other image space with respect to the current pose $p$.

In order to compute the gradient of $E(p)$ I will have to compute the gradient of the rotation matrices, which leads to my actual problem.

  • 0
    I use a pu$r$ely $r$otational motion model to align a pai$r$ of images. The pose parameters, i.e. the rotation axis and angle, are updated according to the gradient of an error function that measures that correct overlap between the two images. In that gradient I come across the gradient of the rotation matrices. That's what I need it for.2011-09-14

7 Answers 7

14

I don't think you need the general Jacobian $\frac{\partial}{\partial \mathbf p}\exp(\hat{\mathbf p})$, but only the much simpler Jacobian $\left.\frac{\partial}{\partial \mathbf p}\exp(\hat{\mathbf p})\right|_{\mathbf p=\mathbf 0}$ with $\mathbf p$ being at the identity.

Background

The group of 3d rotations (SO3) is a matrix lie group. Thus, in general we have the matrix exponential:

$\exp(\mathtt M) := \sum_{k\ge 0} \frac{1}{k!} \mathtt M^k$

which maps an element of the matrix Lie algebra onto the set of the matrix Lie group elemets. Furthermore we have a function $\hat \cdot: \mathbb R^n \rightarrow \mathbb R^{m\times m}, \quad \hat{\mathbf a} = \sum_{k=0}^n a_i\mathtt G_i$ which maps an $n$-vector onto the set of matrix Lie algebra elements (=matrices). Thus, for SO3 the generators are: $ \mathtt G_1 = \begin{bmatrix} 0 & 0 & 0 \\ 0 & 0 & -1 \\ 0 & 1 & 0 \end{bmatrix},\quad \mathtt G_2 =\begin{bmatrix} 0 & 0 & 1 \\ 0 & 0 & 0 \\ -1 & 0 & 0 \end{bmatrix},\quad \mathtt G_3 =\begin{bmatrix} 0 & -1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 0 \end{bmatrix} $

Derivative at the identity

For matrix Lie groups, it can be shown that $ \left.\frac{\partial}{\partial a_k}\exp(\hat {\mathbf a})\right|_{\mathbf a=\mathbf 0} = \mathtt G_k.$

Thus for SO3 $ \left.\frac{\partial}{\partial \mathbf p}\exp(\hat {\mathbf p})\right|_{\mathbf p=\mathbf 0} = \begin{bmatrix}\mathtt G_1 & \mathtt G_2 & \mathtt G_3\end{bmatrix}$ we receive a $3$d row vector of $3\times 3$ matrices, hence a $3\times 3 \times 3$ Jacobian tensor. (Alternatively, one could stack the columns of $\mathtt G_i$ into a 9-vectors and one would receive a $9\times 3$ Jacobian matrix.)

Background: Least square optimization

Let us assume we would like to minimize the function $F:\mathbb R^n \rightarrow \mathbb R$ with respect to a Euclidean vector $\mathbf x$. Furthermore let use assume we have a least square problem with

$F = f(\mathbf x)^\top f(\mathbf x)$

Following Gauss-Newton we repeatedly solve for an update $\delta$ $ \frac{\partial f}{\partial \mathbf x^{(m)}}^\top\frac{\partial f}{\partial \mathbf x^{(m)}} \delta = -\frac{\partial f}{\partial \mathbf x^{(m)}}^\top f(\mathbf x^{(m)})$ and update our estimate $ \mathbf x^{(m+1)}=\delta + \mathbf x^{(m)}$

Least square optimization on matrix Lie groups

This scheme is only valid for Euclidean vector spaces and need to be adapted for matrix Lie groups. Especially, we calculate the derivative in the tangent space around the identity:

$\mathtt J :=\left.\frac{\partial f(\exp(\hat{\mathbf p})\cdot\mathtt R^{(m)})}{\partial \mathbf p} \right|_{\mathbf p =\mathbf 0}$ (Can be calculated from the generators and using the chain rule.)

Then we solve for $\delta$:

$ \mathtt J^\top\mathtt J \delta = -\mathtt J^\top f(\mathtt R^{(m)})$

Finally we adapt the update rule: $ \mathtt R^{(m+1)}= \exp(\hat\delta)\cdot \mathtt R^{(m)}.$

  • 1
    This is$a$very good questions. In a nutshell, both options work, but one has to be consistent: If you post-multiply in the Jacobian, you also have to post-multiply in the update rule.2016-12-26
2

The approach of Least square optimization on matrix Lie groups is also explained in a technical report on Minimization on the Lie Group SO(3) and Related Manifolds.

  • 0
    The report uses $R(w) = R_0 \exp(\delta)$ whereas @Hauke uses left-multiplication. Why?2016-11-03
2

A recent paper by Guillermo Gallego and Anthony Yezzi suggests a compact formula for deriving the rotation matrix in the exponential map coordinates: http://arxiv.org/pdf/1312.0788v1.pdf

Formula (III.7) on page 5 is what you were looking for.

  • 0
    It is hard to connect this question with this formula on the paper.2015-05-22
1
  1. question: Not a complete answer, but shouldn't you derive for $p$ and not $\omega$? But you maybe can do the derivative for $\omega$ as a pre-step, and then derive this for $p$

  2. question: The Jacobian is not a $9\times 4$ matrix because the 4 values of a 4-component representation are not independent of each other. Either you take an arbitrary axis with an angle - this would not be unique - or you take a unit vector with an angle - then the vector-components cannot be chosen freely, as they need to fullfill $\sqrt{x^2+y^2+z^2}=1$. So you only need 3 independent parameters to represent a rotation. Therefore the Jacobian is $9\times 3$ and not $9\times 4$.

  • 0
    I have edited and rephrased my question. Would you mind taking a 2nd look?2011-09-14
1

At this point I am quite certain that I computed the first three elements of $J_R$ correctly. So I can continue computing the remaining elements of the matrix.

One way to confirm that the analytic gradient is correct, is to look at the numeric gradient and then compare values between the two. For this you might want to check the gradient()-function in Matlab or Octave.

The script I wrote goes something like this:

S=1. [x,y,z]=meshgrid(-2*pi:S:2*pi); v=f(x,y,z); % this function is the one at element R(0,0) [dx,dy,dz]=df(x,y,z) % these are the three partial gradients above [ndx,ndy,ndz]=gradient(v,S) figure hold on; quiver3(x,y,z,ndx,ndy,ndz, 'color','red','LineWidth',1); quiver3(x,y,z,dx,dy,dz, 'color','green','LineWidth',1); hold off; legend('numeric gradient','analytic gradient'); 

If all the arrows have approximately the same orientation and length, then this is a good indicator that the derived partial gradients are correct.

1

There are also few other options:

  • compute the derivatives using quaternions - this article explains how to differentiate exponential map this way
  • use simpler Jacobian based on angular velocity - note that in minimization, the underlying model should be locally as linear as possible; simpler derivatives lead to more stable method (I think)
  • use numerical differentiation
1

In this problem, the norm of the vector is an important parameter $\eqalign{ \beta^2 &= \|p\|^2 = p^Tp \cr \beta\,d\beta &= p^Tdp \cr d\beta &= \frac{p^Tdp}{\beta} \cr }$

The skew matrix has some very interesting properties $\eqalign{ P^T &= -P \cr P^2 &= pp^T - \beta^2I \cr }$

The above relationships are well known, now here's something less known.
Vectorizing $P$ is equivalent to multiplying $p$ by a special $(9\times 3)$ matrix $\eqalign{ {\rm vec}(P) &= V p \cr }$

To save vertical space, here is the transpose of the vectorization matrix $ V^T = \begin{bmatrix} 0& 0&\,\,\,\,0&\,\,\,\,0& 0& 1&\,\,\,\,0& -1& 0\\ 0& 0& -1&\,\,\,\,0& 0& 0&\,\,\,\,1&\,\,\,\,0& 0\\ 0& 1&\,\,\,\,0& -1& 0& 0&\,\,\,\,0&\,\,\,\,0& 0\\ \end{bmatrix} $

Now consider the rotation matrix generated by applying the exponential function to $P$. Using Cayley-Hamilton, this can be reduced to a quadratic function of $P$, i.e. Rodrigues' rotation formula. $\eqalign{ R &= \exp(P) = I + \lambda P + \mu P^2 \cr }$

The coefficients $(\lambda, \mu)$ are functions of the parameter $(\beta)$, and it will be convenient to have expressions for their derivatives and differentials $\eqalign{ &\lambda = \frac{\sin(\beta)}{\beta}&\,\,\,&\mu = \frac{1-\cos(\beta)}{\beta^2} \cr &\lambda^\prime = \frac{d\lambda}{d\beta}&\implies\,\, &d\lambda = \lambda^\prime\,d\beta = \frac{\lambda^\prime}{\beta}\,p^Tdp \cr &\mu^\prime = \frac{d\mu}{d\beta}&\implies\,\, &d\mu = \mu^\prime\,d\beta = \frac{\mu^\prime}{\beta}\,p^Tdp \cr }$

The differential of Rodrigues' formula is $\eqalign{ dR &= \lambda\,dP + \mu\,dP^2 + P\,d\lambda + P^2\,d\mu \cr &= \lambda\,dP + \mu(P\,dP+dP\,P) + (\lambda^\prime P+\mu^\prime P^2)\,d\beta \cr }$ Now vectorize this differential and find its Jacobian with respect to $p$ $\eqalign{ dr &= {\rm vec}(dR) \cr &= \lambda V\,dp + \mu(I\otimes P+P^T\otimes I)V\,dp + \frac{1}{\beta}\Big(\lambda^\prime Vp+\mu^\prime(I\otimes P)Vp\Big)p^Tdp \cr J&= \frac{\partial r}{\partial p} \cr &= \lambda V + \mu(I\otimes P-P\otimes I)V + \frac{1}{\beta}\Big(\lambda^\prime V+\mu^\prime(I\otimes P)V\Big)pp^T \cr &= \lambda V + \mu(I\otimes P-P\otimes I)V + \beta\Big(\lambda^\prime V+\mu^\prime(I\otimes P)V\Big)\frac{P^2+\beta^2I}{\beta^2} \cr &= \lambda V + \mu(G-H)V + \beta\big(\lambda^\prime V+\mu^\prime GV\big)(I+F^2) \cr }$ where the final line was simplified by introducing the following variables $\eqalign{ F &= \beta^{-1}P \cr G &= I\otimes P \cr H &= P\otimes I \cr }$