0
$\begingroup$

Here is an exercise 1.5 from the book Numerical Algorithms: Methods for Computer Vision, Machine Learning, and Graphics (by J. Solomon):

Suppose $A,B \in R^{n \times n}$ and $\vec{a},\vec{b} \in R^{n}$. Find a (nontrivial) linear system of equations satisfied by any $\vec{x}$ minimizing the energy $||A\vec{x}-\vec{a}||_{2}^2 + ||B\vec{x}-\vec{b}||_{2}^2$

As I can see, the question is to find system $C\vec{x}=\vec{c}$ which solution is any vector $\vec{x}_{opt}$ that minimises aforementioned function:

$$f(\vec{x})=||A\vec{x}-\vec{a}||_{2}^2 + ||B\vec{x}-\vec{b}||_{2}^2$$

But I can't figure out how to approach to this problem, i.e. can't understand how to deal with that question. I was trying to calculate a gradient of this equation or to use chapter's information about residues and Lagrange multipliers, but don't know if I am going in right direction.

Could someone give me a tip about how to approach to this problem?


Update #1

Using hints given in comments, I've came up with something like this: $$ f(\vec{x})=||Ax-a||_{2}^2 + ||Bx-b||_{2}^2 $$ Expanding norms: $$ f(\vec{x})=||Ax||_2^2 + ||Bx||_2^2 - 2a^TAx - 2b^TBx + ||a||_2^2 + ||b||_2^2 $$ Taking gradient and setting it to zero: $$ \nabla f(\vec{x})=2A\vec{x} + 2B\vec{x} - 2a^TA - 2b^TB = 0 $$ $$ 2(A + B)\vec{x} - 2(a^TA + b^TB) = 0 $$ $$ (A + B)\vec{x} - (a^TA + b^TB)=0 $$ $$ \vec{x}_{opt}=(A + B)^{-1}(a^TA + b^TB) $$

Is it correct?


Update #2

Oh, I see. The derivative was taken in a wrong way. Here how it should be (like it was noted in Walter's answer): $$ f(\vec{x})=x^{\top}A^{\top}Ax + x^{\top}B^{\top}Bx - 2a^{\top}Ax - 2b^{\top}Bx + a^{\top}a + b^{\top}b $$ $$ \nabla f(\vec{x})=2A^{\top}Ax + 2B^{\top}Bx - 2a^{\top}A-2b^{\top}B=0 $$ $$ \vec{x}_{opt}=(A^{\top}A + B^{\top}B)^{-1}(a^{\top}A + b^{\top}B) $$

  • 0
    Use the identity $\|v\|_2^2 = v^Tv$, expand the result, take the gradient.2017-01-16
  • 0
    I've updated original question. Could you tell me please, is it correct?2017-01-24
  • 0
    expand the norms by using $|X|_2^2 = X^TX$ for any expression $X$ that occurs, then just sum up and differentiate, equal to 0 and solve for whatever you want to solve.2017-01-24
  • 0
    It seems I have rough times with vector/matrix derivatives. It seems that I calculated derivates wrongly after norms had been expanded.2017-01-24
  • 0
    There is a small compendium for that somewhere on the internet. The matrix cookbook i think it is called.2017-01-25
  • 0
    Yes, exactly that name: http://www2.imm.dtu.dk/pubdb/views/edoc_download.php/3274/pdf/imm3274.pdf2017-01-31

2 Answers 2

0

$\bf{Hint}$:

If you know the general Least squares approach to $\|Ax-b\|_2^2$, which is \begin{equation} \hat{x}=(A^{\top}A)^{-1}A^{\top}b \end{equation} Then maybe you can rewrite your problem just as general single objective Least Squares problem, remember that the matrix $A$ does not need to be square, but can also be skinny, i.e. $A\in \mathbb{R}^{m\times n}$ with $m>n$.

Do you have a clue what I am hinting at? Otherwise, just let me know.

Edit:

If we go back to your problem of minimizing $\|Ax-a\|_2^2+\|Bx-b\|_2^2$ then see that we can rewrite this as \begin{equation} \left\Vert\begin{bmatrix} A\\ B \end{bmatrix}x - \begin{bmatrix} a\\ b \end{bmatrix}\right\Vert_2^2=\|Cx-c\|_2^2 \end{equation} Now our Least Squares solution becomes(assuming $C$ is full rank) \begin{equation} \hat{x}=(C^{\top}C)^{-1}C^{\top}c= (A^{\top}A+B^{\top}B)^{-1}(A^{\top}a+B^{\top}b) \end{equation}

  • 0
    So do you think that the system of equations that was asked to be found in this exercise i.e. $C\vec{x}=\vec{c}$ is actually a formulation of least squares problem when there are _two_ minimization objectives, right?2017-01-24
  • 0
    I made an edit, hope it helps.2017-01-24
  • 0
    Yes, it seems that the derivatives were taken wrongly. Now it looks almost like your solution (the only difference is multiplication order within second braces). Thank you!2017-01-24
0

Just set the gradient equal to $0$, which immediately yields $$ 2A^T(Ax - b) + 2B^T(Bx - b) = 0. $$ Done!

Here are some hints to take the gradient easily. If $g(x) = \| x \|^2$, then $\nabla g(x) = 2x$. Also, if $h(x) = g(Mx)$, then $\nabla h(x) = M^T \nabla g(Mx)$. (This follows from the chain rule.)

  • 0
    Yep, that's right. Doing it by hand using dot products property seems to be quite inefficient. And nobody should do it using some _real_ matrix >_<2017-01-31