16
$\begingroup$

I am working an algorithm which is supposed to align a pair of images. The motion model, which describes the pose $p$ of an image (with respect to the second) in 3D space, is purely rotational.

Theory to my question:

According to the Rodrigues' formula for rotation matrices, the matrix exponential $R(p)=e^{\hat{p}}$ for a vector $p=(p_x,p_y,p_z)^T\in\mathbb{R}^3$ is given by $$ \begin{align} R(p)&=e^{\hat{p}}=I + \frac{\hat{p}}{||p||} \sin(||p||) + \frac{\hat{p}^2}{||p||^2} (1-\cos{\theta}) \end{align} $$ where $||p||=\sqrt{p_x^2+p_y^2+p_z^2}$ and $$ \begin{align} \hat{p} &= \begin{bmatrix} 0 & -p_z & p_y \\ p_z & 0 & -p_x \\ -p_y & p_x & 0 \end{bmatrix} \end{align}. $$

Summing up the elements of $R(p)$ yields $$ \begin{align} R(p) = \begin{bmatrix} 1-(p_y^2+p_z^2)s & -p_z r+p_xp_y s & p_y r+p_xp_z s \\ p_z r+p_xp_y s & 1-(p_x^2+p_z^2)s & -p_x r+p_yp_z s \\ -p_z r+p_xp_z s & p_x r+p_yp_z s & 1-(p_x^2+p_y^2)s \end{bmatrix} \end{align} $$ where $r=\frac{\sin(||p||)}{||p||}$ and $s=\frac{(1-\cos(||p||))}{||p||^2}$ were introduced for clarity.

Now, let $R^s(p)\in\mathbb{R}^{9\times 1}$ be the stacked version of $R(p)$, i.e. all columns are stacked into one column and let $R^s_i(p)$ denote the $i$-th element of $R^s(p)$ where $1\leq i \leq 9$.

The Jacobian matrix $J_{R}\in\mathbb{R}^{9\times 3}$ with respect to vector $p$ is then given by $$ \begin{align} J_{R}&=\frac{\partial R^s}{\partial p} \\\\ &= \begin{bmatrix} \frac{\partial R^s_1}{\partial p_x} & \frac{\partial R^s_1}{\partial p_y} & \frac{\partial R^s_1}{\partial p_z} \\\\ \vdots & \vdots & \vdots \\\\ \frac{\partial R^s_9}{\partial p_x} & \frac{\partial R^s_9}{\partial p_y} & \frac{\partial R^s_9}{\partial p_z} \end{bmatrix} \end{align} $$

The first element of $J_R$ can then be expressed as $$ \begin{align} \frac{\partial R^s_1}{\partial p_x} &=\frac{\partial}{\partial p_x} \Bigl(1-(p_y^2+p_z^2)\frac{(1-\cos(||p||))}{||p||^2}\Bigr) \\\\ &=-\Bigl[(p_y^2+p_z^2)\frac{\sin(||p||) p_x ||p|| - (1-\cos(||p||) 2 p_x)}{||p||^4}\Bigr] \end{align} $$ the second one as $$ \begin{align} \frac{\partial R^s_1}{\partial p_y} &=\frac{\partial}{\partial p_y} \Bigl(1-(p_y^2+p_z^2)\frac{(1-\cos(||p||))}{||p||^2}\Bigr) \\\\ &=-\Bigl[2p_y\frac{1-\cos(||p||)}{||p||^2}+(p_y^2p_z^2)\frac{\sin(||p||) p_y ||p|| - (1-\cos(||p||) 2 p_y)}{||p||^4}\Bigr] \end{align} $$ and the third element of the first rowas $$ \begin{align} \frac{\partial R^s_1}{\partial p_z} &=\frac{\partial}{\partial p_z} \Bigl(1-(p_y^2+p_z^2)\frac{(1-\cos(||p||))}{||p||^2}\Bigr) \\\\ &=-\Bigl[2p_z\frac{1-\cos(||p||)}{||p||^2}+(p_y^2p_z^2)\frac{\sin(||p||) p_z ||p|| - (1-\cos(||p||) 2 p_z)}{||p||^4}\Bigr] \end{align} $$

Already a big thanks for reading this far ;-)

My question: Is the derivation of these first 3 elements of $J_R$ mathematically correct? If I did something wrong, can you spot the error(s) and possibly help me correct it(/them)?

Once I have got everything straight, I will post the complete Jacobian matrix as the answer to this question. So anyone confronted with the same problem can double-check his own results ...

The reason I need this Jacobian:

The pose $p$ of an image in the application mentioned above can be described by an unit rotation axis and an angle (in radian). Alternatively, the unit axis can be scaled according to the rotation angle, which is what I am using. To do the exponential mapping from $\mathbb{R}^3$ to SO(3) I use the Rodrigues' formula (which is the standard method, I think).

The algorithm "slides" down the gradient of an error function, which is governed by the pose $p$, i.e. $$E(p)=\sum_i(I_1(x'(x_i;p))-I_0(x_i))^2$$ where $x'(\cdot)$ is the function that maps pixels from one image space to the other image space with respect to the current pose $p$.

In order to compute the gradient of $E(p)$ I will have to compute the gradient of the rotation matrices, which leads to my actual problem.

  • 0
    You're right, using those four parameters makes no sense. Why is $\frac{\partial \theta}{\partial \omega_x}=\frac{\omega_x}{\theta}$? From how you introduced these, $\omega_x$ is a component of the unit vector $\omega$, which is independent of $\theta$. It looks a bit like you confused $\theta$ with an angle in a spherical coordinate representation of $\omega$? I think you'll want to take partial derivatives with respect to the components of $p$, but to say more it would be good to know what you're actually doing -- what do you need the Jacobian for?2011-09-13
  • 0
    I use a purely rotational motion model to align a pair of images. The pose parameters, i.e. the rotation axis and angle, are updated according to the gradient of an error function that measures that correct overlap between the two images. In that gradient I come across the gradient of the rotation matrices. That's what I need it for.2011-09-14

7 Answers 7

13

I don't think you need the general Jacobian $\frac{\partial}{\partial \mathbf p}\exp(\hat{\mathbf p})$, but only the much simpler Jacobian $\left.\frac{\partial}{\partial \mathbf p}\exp(\hat{\mathbf p})\right|_{\mathbf p=\mathbf 0}$ with $\mathbf p$ being at the identity.

Background

The group of 3d rotations (SO3) is a matrix lie group. Thus, in general we have the matrix exponential:

$$\exp(\mathtt M) := \sum_{k\ge 0} \frac{1}{k!} \mathtt M^k$$

which maps an element of the matrix Lie algebra onto the set of the matrix Lie group elemets. Furthermore we have a function $$\hat \cdot: \mathbb R^n \rightarrow \mathbb R^{m\times m}, \quad \hat{\mathbf a} = \sum_{k=0}^n a_i\mathtt G_i$$ which maps an $n$-vector onto the set of matrix Lie algebra elements (=matrices). Thus, for SO3 the generators are: $$ \mathtt G_1 = \begin{bmatrix} 0 & 0 & 0 \\ 0 & 0 & -1 \\ 0 & 1 & 0 \end{bmatrix},\quad \mathtt G_2 =\begin{bmatrix} 0 & 0 & 1 \\ 0 & 0 & 0 \\ -1 & 0 & 0 \end{bmatrix},\quad \mathtt G_3 =\begin{bmatrix} 0 & -1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 0 \end{bmatrix} $$

Derivative at the identity

For matrix Lie groups, it can be shown that $$ \left.\frac{\partial}{\partial a_k}\exp(\hat {\mathbf a})\right|_{\mathbf a=\mathbf 0} = \mathtt G_k.$$

Thus for SO3 $$ \left.\frac{\partial}{\partial \mathbf p}\exp(\hat {\mathbf p})\right|_{\mathbf p=\mathbf 0} = \begin{bmatrix}\mathtt G_1 & \mathtt G_2 & \mathtt G_3\end{bmatrix}$$ we receive a $3$d row vector of $3\times 3$ matrices, hence a $3\times 3 \times 3$ Jacobian tensor. (Alternatively, one could stack the columns of $\mathtt G_i$ into a 9-vectors and one would receive a $9\times 3$ Jacobian matrix.)

Background: Least square optimization

Let us assume we would like to minimize the function $F:\mathbb R^n \rightarrow \mathbb R$ with respect to a Euclidean vector $\mathbf x$. Furthermore let use assume we have a least square problem with

$$F = f(\mathbf x)^\top f(\mathbf x)$$

Following Gauss-Newton we repeatedly solve for an update $\delta$ $$ \frac{\partial f}{\partial \mathbf x^{(m)}}^\top\frac{\partial f}{\partial \mathbf x^{(m)}} \delta = -\frac{\partial f}{\partial \mathbf x^{(m)}}^\top f(\mathbf x^{(m)})$$ and update our estimate $$ \mathbf x^{(m+1)}=\delta + \mathbf x^{(m)}$$

Least square optimization on matrix Lie groups

This scheme is only valid for Euclidean vector spaces and need to be adapted for matrix Lie groups. Especially, we calculate the derivative in the tangent space around the identity:

$$\mathtt J :=\left.\frac{\partial f(\exp(\hat{\mathbf p})\cdot\mathtt R^{(m)})}{\partial \mathbf p} \right|_{\mathbf p =\mathbf 0}$$ (Can be calculated from the generators and using the chain rule.)

Then we solve for $\delta$:

$$ \mathtt J^\top\mathtt J \delta = -\mathtt J^\top f(\mathtt R^{(m)})$$

Finally we adapt the update rule: $$ \mathtt R^{(m+1)}= \exp(\hat\delta)\cdot \mathtt R^{(m)}.$$

  • 0
    Thanks a lot for your detailed and clear answer! Only two things that keep me wondering: (1) why is the simpler gradient is sufficient for a motion update? What is the intuition behind it? (2) my approach, though more complicated and computationally more expensive, is not wrong, is it?2011-09-20
  • 0
    (2) I think your approach is alright (though I didn't check your Jacobians in detail). I doubt that your approach will reach the same convergence rate, since you are using an additive update on the rotations (axis-angle) which I guess is not 100% correct for larger updates: $\exp(\mathbf a +\mathbf b) \neq \exp(\mathbf a)\cdot \exp(\mathbf b)$.2011-09-20
  • 1
    (1) This is the miracle of manifolds: Being locally Euclidean. Basically it is like the sine function which can be can be approximated by a line with slope 1 for x close to the identity: $\frac{\partial \sin(x)}{\partial x}|_{x=0} = 1$2011-09-20
  • 0
    Ahaaa!, then the underlying assumption in (1) is that the update step $\delta$ in each iteration is "close enough" to zero (i.e. $p=0$) so that using the Jacobian at $p=0$ is still "accurate enough". Is that correct? In any case thanks a bunch for all the clarification.2011-09-20
  • 0
    Yeah that's the main idea! But note that the Jacobian is exact for $p=0$. Also, we always try to approximate the neighborhood of the point $p$ with a Jacobian $\mathtt J(p)$, but exactly valid is $\mathtt J(p)$ only at point $p$ (for general differentiable functions).2011-09-20
  • 0
    Hauke, I still don't get this line: $\frac{\partial}{\partial a_{k}}\exp(\hat{a})=G_{k}$. I have found in another [work](http://www.cs.bath.ac.uk/brown/papers/ijcv2007.pdf) (equation 23) this: $\frac{\partial}{\partial a_{k}}\exp(\hat{a})=\exp(\hat{a})\cdot G_{k}$. Can you please clarify this for me?2012-02-27
  • 0
    Note that $a=0$ here! Thus, $\exp(\hat{a})=\exp(0)=I$.2012-02-27
  • 0
    Oh, I see. So the derivative of exponential map is $G_{k}$ for $a$ being a null vector, but can be approximated by $\exp(\hat{a})\cdot G_{k}$ otherwise. Is that right? Sorry, I am not much into this kind of calculus.2012-02-27
  • 0
    That's correct, but there is no approximation. Let me write your (equation 23) in a slightly different, slightly more general way: $\frac{\partial}{\partial t}\exp(tA)= A\exp(tA) = \exp(tA)A$. That is (one of) the main reason why we call the matrix exponential "exponential", since it behave very similar to the standard exponential function: It is its own derivative.2012-02-28
  • 0
    We test the closed formula $\frac{\partial \mathbf{e}_i(\mathbf{\omega})}{\partial \mathbf{\omega}_k} = exp(\omega) \cdot \mathbf{G}_k$ and compared it to the Pade' approximation in our optimization problem obtaining a slower convergence. We have added a question about that here: http://math.stackexchange.com/questions/812563/jacobian-of-exponential-mapping-in-so3-se3.2014-05-28
  • 0
    Why do you pre-multiply $\exp(\hat{\mathbf p})$? Why not $$\mathtt J :=\left.\frac{\partial f(\mathtt R^{(m)} \cdot \exp(\hat{\mathbf p}) )}{\partial \mathbf p} \right|_{\mathbf p =\mathbf 0}$$?2016-11-11
  • 1
    This is a very good questions. In a nutshell, both options work, but one has to be consistent: If you post-multiply in the Jacobian, you also have to post-multiply in the update rule.2016-12-26
2

The approach of Least square optimization on matrix Lie groups is also explained in a technical report on Minimization on the Lie Group SO(3) and Related Manifolds.

  • 0
    There is a doubt: To optimize an objective function $\mathcal O(\mathtt R^{(m)})$: $$\mathcal O(\mathtt R^{(m+1)}) \approx \mathcal O(\mathtt R^{(m)}) + \mathbb g^\top\delta + \delta^\top \mathtt H \delta$$ The gradient $\mathbb g$ and Hessian $\mathtt H$ are calculated at the point corresponding to $\mathtt R^{(m)}$ which is initially $\mathbf p = \mathbf 0$. My question is as soon as we update $\mathtt R^{(m)}$, it longer corresponds to $\mathbf p = \mathbf 0$. So, we need to calculate gradient and Hessian at different $\mathbf p$ i.e. $\mathbf p^{(m+1)} = \delta + \mathbf p^{(m)}$2012-08-09
  • 1
    I know this is old, but just for clarification: The whole point of the report is that the gradient and the Hessian are computed w.r.t. local coordinates, i.e. you choose your coordinates so that the matrix from the previous step is at the origin. Thanks for the interesting read! The report uses the same techniques as in Hauke Strasdat's excellent answer.2013-08-08
  • 0
    The report uses $R(w) = R_0 \exp(\delta)$ whereas @Hauke uses left-multiplication. Why?2016-11-03
2

A recent paper by Guillermo Gallego and Anthony Yezzi suggests a compact formula for deriving the rotation matrix in the exponential map coordinates: http://arxiv.org/pdf/1312.0788v1.pdf

Formula (III.7) on page 5 is what you were looking for.

  • 0
    It is hard to connect this question with this formula on the paper.2015-05-22
1
  1. question: Not a complete answer, but shouldn't you derive for $p$ and not $\omega$? But you maybe can do the derivative for $\omega$ as a pre-step, and then derive this for $p$

  2. question: The Jacobian is not a $9\times 4$ matrix because the 4 values of a 4-component representation are not independent of each other. Either you take an arbitrary axis with an angle - this would not be unique - or you take a unit vector with an angle - then the vector-components cannot be chosen freely, as they need to fullfill $\sqrt{x^2+y^2+z^2}=1$. So you only need 3 independent parameters to represent a rotation. Therefore the Jacobian is $9\times 3$ and not $9\times 4$.

  • 0
    Yes, you are right. It should be derived for $p$ not $\omega$. And thank you for the clarification in Q2, too.2011-09-14
  • 0
    I have edited and rephrased my question. Would you mind taking a 2nd look?2011-09-14
1

At this point I am quite certain that I computed the first three elements of $J_R$ correctly. So I can continue computing the remaining elements of the matrix.

One way to confirm that the analytic gradient is correct, is to look at the numeric gradient and then compare values between the two. For this you might want to check the gradient()-function in Matlab or Octave.

The script I wrote goes something like this:

S=1. [x,y,z]=meshgrid(-2*pi:S:2*pi); v=f(x,y,z); % this function is the one at element R(0,0) [dx,dy,dz]=df(x,y,z) % these are the three partial gradients above [ndx,ndy,ndz]=gradient(v,S) figure hold on; quiver3(x,y,z,ndx,ndy,ndz, 'color','red','LineWidth',1); quiver3(x,y,z,dx,dy,dz, 'color','green','LineWidth',1); hold off; legend('numeric gradient','analytic gradient'); 

If all the arrows have approximately the same orientation and length, then this is a good indicator that the derived partial gradients are correct.

1

There are also few other options:

  • compute the derivatives using quaternions - this article explains how to differentiate exponential map this way
  • use simpler Jacobian based on angular velocity - note that in minimization, the underlying model should be locally as linear as possible; simpler derivatives lead to more stable method (I think)
  • use numerical differentiation
1

In this problem, the norm of the vector is an important parameter $$\eqalign{ \beta^2 &= \|p\|^2 = p^Tp \cr \beta\,d\beta &= p^Tdp \cr d\beta &= \frac{p^Tdp}{\beta} \cr }$$

The skew matrix has some very interesting properties $$\eqalign{ P^T &= -P \cr P^2 &= pp^T - \beta^2I \cr }$$

The above relationships are well known, now here's something less known.
Vectorizing $P$ is equivalent to multiplying $p$ by a special $(9\times 3)$ matrix $$\eqalign{ {\rm vec}(P) &= V p \cr }$$

To save vertical space, here is the transpose of the vectorization matrix $$ V^T = \begin{bmatrix} 0& 0&\,\,\,\,0&\,\,\,\,0& 0& 1&\,\,\,\,0& -1& 0\\ 0& 0& -1&\,\,\,\,0& 0& 0&\,\,\,\,1&\,\,\,\,0& 0\\ 0& 1&\,\,\,\,0& -1& 0& 0&\,\,\,\,0&\,\,\,\,0& 0\\ \end{bmatrix} $$

Now consider the rotation matrix generated by applying the exponential function to $P$. Using Cayley-Hamilton, this can be reduced to a quadratic function of $P$, i.e. Rodrigues' rotation formula. $$\eqalign{ R &= \exp(P) = I + \lambda P + \mu P^2 \cr }$$

The coefficients $(\lambda, \mu)$ are functions of the parameter $(\beta)$, and it will be convenient to have expressions for their derivatives and differentials $$\eqalign{ &\lambda = \frac{\sin(\beta)}{\beta}&\,\,\,&\mu = \frac{1-\cos(\beta)}{\beta^2} \cr &\lambda^\prime = \frac{d\lambda}{d\beta}&\implies\,\, &d\lambda = \lambda^\prime\,d\beta = \frac{\lambda^\prime}{\beta}\,p^Tdp \cr &\mu^\prime = \frac{d\mu}{d\beta}&\implies\,\, &d\mu = \mu^\prime\,d\beta = \frac{\mu^\prime}{\beta}\,p^Tdp \cr }$$

The differential of Rodrigues' formula is $$\eqalign{ dR &= \lambda\,dP + \mu\,dP^2 + P\,d\lambda + P^2\,d\mu \cr &= \lambda\,dP + \mu(P\,dP+dP\,P) + (\lambda^\prime P+\mu^\prime P^2)\,d\beta \cr }$$ Now vectorize this differential and find its Jacobian with respect to $p$ $$\eqalign{ dr &= {\rm vec}(dR) \cr &= \lambda V\,dp + \mu(I\otimes P+P^T\otimes I)V\,dp + \frac{1}{\beta}\Big(\lambda^\prime Vp+\mu^\prime(I\otimes P)Vp\Big)p^Tdp \cr J&= \frac{\partial r}{\partial p} \cr &= \lambda V + \mu(I\otimes P-P\otimes I)V + \frac{1}{\beta}\Big(\lambda^\prime V+\mu^\prime(I\otimes P)V\Big)pp^T \cr &= \lambda V + \mu(I\otimes P-P\otimes I)V + \beta\Big(\lambda^\prime V+\mu^\prime(I\otimes P)V\Big)\frac{P^2+\beta^2I}{\beta^2} \cr &= \lambda V + \mu(G-H)V + \beta\big(\lambda^\prime V+\mu^\prime GV\big)(I+F^2) \cr }$$ where the final line was simplified by introducing the following variables $$\eqalign{ F &= \beta^{-1}P \cr G &= I\otimes P \cr H &= P\otimes I \cr }$$