3
$\begingroup$

Refering to a question previously asked Jacobian matrix of the Rodrigues' formula (exponential map). It was suggested in one of the answer to only calculate simpler Jacobian $\left.\frac{\partial}{\partial \mathbf p}\exp(\hat{\mathbf p})\right|_{\mathbf p=\mathbf 0}$ with $\exp(\hat{\mathbf p})$ being identity.

Similar idea is suggested in a technical report on Minimization on the Lie Group SO(3) and Related Manifolds .

Using Gauss-Newton based strategy, it was suggested to solve for $\delta$:

$$ \mathtt J^\top\mathtt J \delta = -\mathtt J^\top f(\mathtt R^{(m)})$$

Final update rule: $$ \mathtt R^{(m+1)}= \exp(\hat\delta)\cdot \mathtt R^{(m)}.$$

Since, Gauss-Newton stems from the Taylor series approximation. To optimize an objective function $\mathcal O(\mathtt R^{(m)})$: $$\mathcal O(\mathtt R^{(m+1)}) \approx \mathcal O(\mathtt R^{(m)}) + \mathbb g^\top\delta + \delta^\top \mathtt H \delta$$

The gradient $\mathbb g$ and Hessian $\mathtt H$ are calculated at the point corresponding to $\mathtt R^{(m)}$ which is initially $\mathbf p = \mathbf 0$.

My question: As soon as we update $\mathtt R^{(m)}$, it no longer corresponds to $\mathbf p = \mathbf 0$. So, we should calculate gradient and Hessian at different $\mathbf p$ i.e. $\mathbf p^{(m+1)} = \delta + \mathbf p^{(m)}$. Can someone explain the idea behind only using the gradient and Hessian at $\mathbf p = \mathbf 0$? (This was suggested in the technical report above and the question referred.)

  • 1
    You do not seem to have asked a question.2012-08-09
  • 0
    I have rephrased the question.2012-08-09
  • 0
    Maybe a good idea is to write a sentence ending in a question mark :-)2012-08-09
  • 0
    I hope it is clearer now.2012-08-09

3 Answers 3

4

I am trying to solve the confusion, not using a rigorous mathematical argument, but rather by illustrating the similarity of standard Gauss Newton over the Euclidean Space and Gauss Newton wrt. to a matrix Lie Group.

Let us first look at the derivative of a scalar function $f$: $$\frac{\partial f(x)}{\partial x} := \lim_{y \to x} \frac{f(y)-f(x)}{y-x}$$ Now with $$\lim_{y \to x} \frac{f(y)-f(x)}{y-x} \overset{y=p+x}{=} \lim_{p \to 0} \frac{f(p+x)-f(x)}{p} =: \left.\frac{\partial f(p+x)}{\partial p}\right|_{p=0}$$

Thus, we get: $$\frac{\partial f(x)}{\partial x} = \left.\frac{\partial f(p+x)}{\partial p}\right|_{p=0}~.$$

In other words, any derivative $\frac{\partial f(x)}{\partial x}$ can be written as a derivative around zero $\left.\frac{\partial f(p+x)}{\partial p}\right|_{p=0}$. (Source: Derivative as derivative around zero?)

Using a similar argument we can show that it holds for partial derivative of a function defined over the Euclidean vector space: $$\frac{\partial f(\mathbf x)}{\partial x_i} = \left.\frac{\partial f(\mathbf p+ \mathbf x)}{\partial p_i}\right|_{\mathbf p=0}~.$$

Hence, the standard Gauss Newton Scheme can be rewritten as:

$$ \mathbf J^\top\mathbf J \delta = -\mathbf J^\top f(\mathbf x) \quad\text{with}\quad \mathbf J :=\left.\frac{\partial f(\mathbf p+\mathbf x_m)}{\partial \mathbf p} \right|_{\mathbf p =\mathbf 0}$$

With the update rule: $$ \mathbf x_{m+1}= \mathbf \delta + \mathbf x_m.$$

Thus, we can conclude that even the standard Gauss Newton approach can be seen as an optimisation around the identity ($\mathbf p =\mathbf 0$). Obviously, the Jacobian needs to be recalculated after each update!


Maybe this answers the question already but let me continue:

A matrix Lie group can be seen as a generalization over the Euclidean vector space. Thus, the Euclidean vector space is a trivial example of a matrix Lie group. We represent a vector $\mathbf a \in \mathbb{R}^n$ as a $(n+1)\times (n+1)$ matrix $$\mathbf{A} := \begin{bmatrix}\mathbf{I}_{n\times n} &\mathbf a\\\mathbf{O}_{1\times n}&1 \end{bmatrix} $$ the vector addition is written as matrix multiplication $$\begin{bmatrix}\mathbf{I}_{n\times n} &\mathbf a\\\mathbf{O}_{1\times n}&1 \end{bmatrix}\cdot \begin{bmatrix}\mathbf{I}_{n\times n}&\mathbf b\\\mathbf{O}_{1\times n}&1 \end{bmatrix} = \begin{bmatrix}\mathbf{I}_{n\times n} &\mathbf a+\mathbf b\\\mathbf{O}_{1\times n}&1 \end{bmatrix}$$ and the matrix exponential is simply the identity: $$ \exp(\hat{\mathbf a}) = \begin{bmatrix}\mathbf{I}_{n\times n} &\mathbf a\\\mathbf{O}_{1\times n}&1 \end{bmatrix}~.$$ Using these new conventions, we can rewrite the Gauss Newton scheme ($\mathbf A$ instead of $\mathbf a$, matrix multiplication instead of vector addition, $\exp(\hat{\mathbf p})$ instead of $\mathbf p$).

Finally we get the following Jacobian and update rule for the generalized Gauss Newton Scheme:

$$\mathbf J :=\left.\frac{\partial f(\exp(\hat{\mathbf p})\cdot\mathbf A^{(m)})}{\partial \mathbf p} \right|_{\mathbf p =\mathbf 0}$$

and

$$ \mathbf A^{(m+1)}= \exp(\hat\delta)\cdot \mathbf A^{(m)}.$$

  • 0
    Thanks, it's really comprehensive! But, the last part has caused another question: The matrix exponential should behave like: $exp: so(n) \rightarrow SO(n)$ assuming $\hat{.}: \mathbb{R}^n \rightarrow so(n)$ but that doesn't seem to be the case with the result in: $$ \exp(\hat{\mathbf a}) = \begin{bmatrix}\mathbf{I}_{n\times n} &\mathbf a\\\mathbf{O}_{1\times n}&1 \end{bmatrix}~.$$2012-08-16
  • 0
    Different matrix Lie groups have different matrix exponentials. I presented, as an example, the matrix exponential for the Lie group of Euclidean vectors. As you said, the matrix exponential for SO(3) is different. But everything else stays the same...2012-08-16
1

I think the problem lies in "As soon as we update $\mathtt R^{(m)}$, it no longer corresponds to $\mathbf p = \mathbf 0$". As I understand the answer you're referring to, it does; $\mathbf p$ is just the change, not the whole thing. It says

$$ \mathtt J :=\left.\frac{\partial f(\exp(\hat{\mathbf p})\cdot\mathtt R^{(m)})}{\partial \mathbf p} \right|_{\mathbf p =\mathbf 0}\;, $$

so $f$ is being evaluated at $\exp(\hat{\mathbf p})$ times the current rotation, and $\mathbf p = \mathbf 0$ does correspond to $\mathtt R^{(m)}$.

  • 0
    But, in the case of Euclidean vector spaces: $$F = f(\mathbf x)^\top f(\mathbf x)$$ For Gauss-Newton: $$ \mathtt J :=\left.\frac{\partial f}{\partial \mathbf x} \right|_{\mathbf x =\mathbf x^{(m)}} $$ This is the part of the confusion that when adapting this formulation for Lie Groups, why do the author restrict to calculate the derivative in the tangent space around the identity $\mathbf p =\mathbf 0$.2012-08-10
  • 0
    @Omer: Sorry, I'm not sure I'm following you. Right underneath that formula for $F$ that you quote, an update $\delta$ is introduced. That plays exactly the same role as $\mathbf p$. The derivative is calculated at the old position, $\mathbf x^{(m)}$, corresponding to $\delta=0$; likewise, the derivative is calculated at the old position, $\mathtt R^{(m)}$, corresponding to $\mathbf p=\mathbf0$. I don't see a difference; if you still do, please try to describe it in more detail.2012-08-10
0

The paper refers to the parameterization as "local" around the point $R_0$. At every iteration of the minimization the reference point is updated by applying the rotation associated to the step $\omega$ given by the Newton update.

So essentially "small" increments are calculated in a changing reference frame always with respect to the last result.

The purpose of the local parameterization is to avoid pathologies (e.g. gimbal lock of Euler angles) in a global representation, that originate from the incompatibility of the rotations with $\mathbb{R}^3$.