3
$\begingroup$

Refering to a question previously asked Jacobian matrix of the Rodrigues' formula (exponential map). It was suggested in one of the answer to only calculate simpler Jacobian $\left.\frac{\partial}{\partial \mathbf p}\exp(\hat{\mathbf p})\right|_{\mathbf p=\mathbf 0}$ with $\exp(\hat{\mathbf p})$ being identity.

Similar idea is suggested in a technical report on Minimization on the Lie Group SO(3) and Related Manifolds .

Using Gauss-Newton based strategy, it was suggested to solve for $\delta$:

$ \mathtt J^\top\mathtt J \delta = -\mathtt J^\top f(\mathtt R^{(m)})$

Final update rule: $ \mathtt R^{(m+1)}= \exp(\hat\delta)\cdot \mathtt R^{(m)}.$

Since, Gauss-Newton stems from the Taylor series approximation. To optimize an objective function $\mathcal O(\mathtt R^{(m)})$: $\mathcal O(\mathtt R^{(m+1)}) \approx \mathcal O(\mathtt R^{(m)}) + \mathbb g^\top\delta + \delta^\top \mathtt H \delta$

The gradient $\mathbb g$ and Hessian $\mathtt H$ are calculated at the point corresponding to $\mathtt R^{(m)}$ which is initially $\mathbf p = \mathbf 0$.

My question: As soon as we update $\mathtt R^{(m)}$, it no longer corresponds to $\mathbf p = \mathbf 0$. So, we should calculate gradient and Hessian at different $\mathbf p$ i.e. $\mathbf p^{(m+1)} = \delta + \mathbf p^{(m)}$. Can someone explain the idea behind only using the gradient and Hessian at $\mathbf p = \mathbf 0$? (This was suggested in the technical report above and the question referred.)

  • 0
    I hope it is clearer now.2012-08-09

3 Answers 3

5

I am trying to solve the confusion, not using a rigorous mathematical argument, but rather by illustrating the similarity of standard Gauss Newton over the Euclidean Space and Gauss Newton wrt. to a matrix Lie Group.

Let us first look at the derivative of a scalar function $f$: $\frac{\partial f(x)}{\partial x} := \lim_{y \to x} \frac{f(y)-f(x)}{y-x}$ Now with $\lim_{y \to x} \frac{f(y)-f(x)}{y-x} \overset{y=p+x}{=} \lim_{p \to 0} \frac{f(p+x)-f(x)}{p} =: \left.\frac{\partial f(p+x)}{\partial p}\right|_{p=0}$

Thus, we get: $\frac{\partial f(x)}{\partial x} = \left.\frac{\partial f(p+x)}{\partial p}\right|_{p=0}~.$

In other words, any derivative $\frac{\partial f(x)}{\partial x}$ can be written as a derivative around zero $\left.\frac{\partial f(p+x)}{\partial p}\right|_{p=0}$. (Source: Derivative as derivative around zero?)

Using a similar argument we can show that it holds for partial derivative of a function defined over the Euclidean vector space: $\frac{\partial f(\mathbf x)}{\partial x_i} = \left.\frac{\partial f(\mathbf p+ \mathbf x)}{\partial p_i}\right|_{\mathbf p=0}~.$

Hence, the standard Gauss Newton Scheme can be rewritten as:

$ \mathbf J^\top\mathbf J \delta = -\mathbf J^\top f(\mathbf x) \quad\text{with}\quad \mathbf J :=\left.\frac{\partial f(\mathbf p+\mathbf x_m)}{\partial \mathbf p} \right|_{\mathbf p =\mathbf 0}$

With the update rule: $ \mathbf x_{m+1}= \mathbf \delta + \mathbf x_m.$

Thus, we can conclude that even the standard Gauss Newton approach can be seen as an optimisation around the identity ($\mathbf p =\mathbf 0$). Obviously, the Jacobian needs to be recalculated after each update!


Maybe this answers the question already but let me continue:

A matrix Lie group can be seen as a generalization over the Euclidean vector space. Thus, the Euclidean vector space is a trivial example of a matrix Lie group. We represent a vector $\mathbf a \in \mathbb{R}^n$ as a $(n+1)\times (n+1)$ matrix $\mathbf{A} := \begin{bmatrix}\mathbf{I}_{n\times n} &\mathbf a\\\mathbf{O}_{1\times n}&1 \end{bmatrix} $ the vector addition is written as matrix multiplication $\begin{bmatrix}\mathbf{I}_{n\times n} &\mathbf a\\\mathbf{O}_{1\times n}&1 \end{bmatrix}\cdot \begin{bmatrix}\mathbf{I}_{n\times n}&\mathbf b\\\mathbf{O}_{1\times n}&1 \end{bmatrix} = \begin{bmatrix}\mathbf{I}_{n\times n} &\mathbf a+\mathbf b\\\mathbf{O}_{1\times n}&1 \end{bmatrix}$ and the matrix exponential is simply the identity: $ \exp(\hat{\mathbf a}) = \begin{bmatrix}\mathbf{I}_{n\times n} &\mathbf a\\\mathbf{O}_{1\times n}&1 \end{bmatrix}~.$ Using these new conventions, we can rewrite the Gauss Newton scheme ($\mathbf A$ instead of $\mathbf a$, matrix multiplication instead of vector addition, $\exp(\hat{\mathbf p})$ instead of $\mathbf p$).

Finally we get the following Jacobian and update rule for the generalized Gauss Newton Scheme:

$\mathbf J :=\left.\frac{\partial f(\exp(\hat{\mathbf p})\cdot\mathbf A^{(m)})}{\partial \mathbf p} \right|_{\mathbf p =\mathbf 0}$

and

$ \mathbf A^{(m+1)}= \exp(\hat\delta)\cdot \mathbf A^{(m)}.$

  • 0
    Different matrix Lie groups have different matrix exponentials. I presented, as an example, the matrix exponential for the Lie group of Euclidean vectors. As you said, the matrix exponential for SO(3) is different. But everything else stays the same...2012-08-16
1

I think the problem lies in "As soon as we update $\mathtt R^{(m)}$, it no longer corresponds to $\mathbf p = \mathbf 0$". As I understand the answer you're referring to, it does; $\mathbf p$ is just the change, not the whole thing. It says

$ \mathtt J :=\left.\frac{\partial f(\exp(\hat{\mathbf p})\cdot\mathtt R^{(m)})}{\partial \mathbf p} \right|_{\mathbf p =\mathbf 0}\;, $

so $f$ is being evaluated at $\exp(\hat{\mathbf p})$ times the current rotation, and $\mathbf p = \mathbf 0$ does correspond to $\mathtt R^{(m)}$.

  • 0
    @Omer: Sorry, I'm not sure I'm following you. Right underneath that formula for $F$ that you quote, an update $\delta$ is introduced. That plays exactly the same role as $\mathbf p$. The derivative is calculated at the old position, $\mathbf x^{(m)}$, corresponding to $\delta=0$; likewise, the derivative is calculated at the old position, $\mathtt R^{(m)}$, corresponding to $\mathbf p=\mathbf0$. I don't see a difference; if you still do, please try to describe it in more detail.2012-08-10
0

The paper refers to the parameterization as "local" around the point $R_0$. At every iteration of the minimization the reference point is updated by applying the rotation associated to the step $\omega$ given by the Newton update.

So essentially "small" increments are calculated in a changing reference frame always with respect to the last result.

The purpose of the local parameterization is to avoid pathologies (e.g. gimbal lock of Euler angles) in a global representation, that originate from the incompatibility of the rotations with $\mathbb{R}^3$.