6
$\begingroup$

I have an application where I have to minimize a cost function of the form:
$J(\mathbf x) = \| \mathbf A \mathbf x - \mathbf b \|^2 \quad (1)$

By calculating the gradient, I derived that I have to solve the system of equations:
$\mathbf A^T \mathbf A \mathbf x = \mathbf A^T \mathbf b \quad (2)$

Now my question is, when can I solve the following system instead?
$\mathbf A \mathbf x = \mathbf b \quad (3)$

From my point of view, this depends on $\mathbf A$ to be invertible. In my application $\mathbf A$ is a square matrix of the form $\mathbf A = \mathbf I - \mathbf W$ where $\mathbf I$ is the identity matrix and $\mathbf W$ is a square matrix with zeros on the main diagonal and small values on a few secondary diagonals. The values of $\mathbf W$ are arbitrary but normalized, so that each row of $\mathbf W$ sums to 1. However, some lines of $\mathbf W$ can also be zero.

For example $\mathbf A$ may look like this:
$\mathbf A = \pmatrix{ 1 & -0.2 & 0 & 0 & -0.3 & -0.2 & -0.3 & 0 & 0 \cr -0.1 & 1 & -0.2 & 0 & 0 & -0.4 & -0.2 & -0.1 & 0 \cr 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \cr 0 & 0 & -0.2 & 1 & -0.1 & 0 & 0 & -0.6 & -0.1 \cr -0.2 & 0 & 0 & -0.2 & 1 & -0.5 & 0 & 0 & -0.1 \cr -0.4 & -0.3 & 0 & 0 & -0.1 & 1 & -0.2 & 0 & 0 \cr -0.1 & -0.1 & -0.1 & 0 & 0 & -0.6 & 1 & -0.1 & 0 \cr 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 \cr 0 & 0 & -0.2 & -0.4 & -0.1 & 0 & 0 & -0.3 & 1 \cr }$

Have I argued correctly? Then how can I show that $\mathbf A$ is invertible? Or is there any other argument for solving (3) instead of (2)?

I tried to solve both systems (2) and (3) with MATLAB and the Intel MKL, but surprisingly only (3) gave me the expected result. I would expect that it also works with (2). Maybe a numerical problem?

  • 0
    @Gabriel, "well-behaved", not necessarily. As p.s. says in his comment, forming the cross-product matrix (roughly) squares the condition number, which leads to some loss of significant figures in the process of forming the solution. Consider for instance the Hilbert matrix, which is both symmetric positive definte **and** ill-conditioned.2012-07-21

4 Answers 4

1

Well if there exists an $x$, so that $Ax = b$, then of course the minimum of $J(x)$ will be $0$ (because $J(x) \geq 0 ~\forall x$). There exists a solution $x$, if and only if rank(A) = rank(A,b). So it doesnt depend on A being invertible, rather if b "adds" to the rank of A.

If there exists no such $x$, you have to solve $A^TAx = Ab$

1

Just some thoughts: The matrix $A$ is weakly diagonal dominant. If one of the rows of $W$ is trivial then it is even irreducibly diagonal dominant and by the Gershgorin circle theorem regular. Then $Ax=b$ has a solution.

If none of the rows of $W$ is trivial then the matrix is surely singular since the sum of the columns is zero. However the rank will be $n-1$, where $n$ is the size of the matrix. Indeed we may pick a column with at least one negative entry. If we remove this column and the corresponding row we obtain yet another irreducibly diagonal dominant matrix which is regular.

Hence the $Ax$ represents a hyperplane in $\mathbb R^n$ and the minimum of $J$ is the distance of $b$ to that hyperlane.

Edit: I am assuming that $A$ is irreducible. I actually have no idea whether this is ok in practise or whether restricting to "irreducible components" is a sensible thing to do.

Edit 2: If I am not mistaken there is (up to sign) exactly one normal vector $v$ orthogonal to the image of $A$ (that is in the $A$ singular case). Then the minimum of $J$ should be just the scalar product of $b$ and $v$. The vector $v$ should be orthogonal to all columns of $A$, which is the same as being in the kernel of $A^T$.

  • 0
    Good point. Actually, the trivial rows of $\mathbf W$ come from a constraint, fixing the value of certain $\mathbf x_i$. It makes sense that there is no solution without this constraint. Also, the property of diagonal dominance seems to be helpful to show that $\mathbf A$ is non-singular. I'll think about that! Thanks2012-07-20
0

Let's look at three cases of the problem $\mathbf{A}\mathbf{x} = \mathbf{b}$:

  1. $\mathbf{A}$ is square
  2. $\mathbf{A}$ is non-square and has row-dimension > column-dimension
  3. $\mathbf{A}$ is non-square and has column-dimension > row-dimension

Of course, these are the only three conditions that can happen.

WLOG, assume that the rows of $\mathbf{A}$ are linearly independent.

In case (1), the matrix is square and of full rank. Therefore, $\mathbf{A}$ is invertible, and you can exactly compute $\mathbf{x} = \mathbf{A}^{-1}\mathbf{b}$.

In case (2), you have an overdefined system. Think about it as if you've taken 4 measurements, with some possible measurement error, to solve for 2 unknowns. In this case, you cannot solve the system directly, but you can exactly compute the solution to $\mathbf{A}^T\mathbf{A}\mathbf{x} = \mathbf{A}^T\mathbf{b}$. This is exactly the same thing as computing a linear regression.

In case (3), you have an underdefined system. This is like trying to solve for 3 unknowns with only 2 equations. Very tricky indeed! In such a case, there is no direct solution to $\mathbf{A}^T\mathbf{A}\mathbf{x} = \mathbf{A}^T\mathbf{b}$. You have to, as in your problem, minimize the error using some optimization algorithm, of which there are many and varied.

How does this apply to your problem?

Well, you can completely determine your algorithmic approach by looking at the dimensions of the column space and row space of your matrix $\mathbf{A}$. By computing the number of linearly independent row/column vectors, you can determine easily whether $A$ represents a solvable, underdetermined, or overdetermined system, and choose your algorithmic approach based on that.

  • 0
    Of course, $\mathbf{Ax'} = \mathbf{b}$ is nonsense, because the solution $\mathbf{x}$ to $\mathbf{Ax} = \mathbf{b}$ is unique.2012-07-20
0

First two trivial remarks: any solution of $\mathbf A\mathbf x= \mathbf b$ is of course also a solution of $\mathbf A^T \mathbf A \mathbf x = \mathbf A^T\mathbf b $, and if $\mathbf A$ is invertible then the converse is also true (beacuse $\mathbf A^T$ is then also invertible and its inverse can be applied to both sides of the equation).

More is true though. Whether $\mathbf A$ is invertible or not, and even if it is non-square: if $\mathbf A\mathbf x= \mathbf b$ has any solutions, then $\mathbf A^T \mathbf A \mathbf x = \mathbf A^T\mathbf b $ has exactly the same solutions. However if $\mathbf A\mathbf x= \mathbf b$ has no solution then $\mathbf A^T \mathbf A \mathbf x = \mathbf A^T\mathbf b $ will still have solutions; a trivial example is when $\mathbf A$ is zero but $\mathbf b$ is not.

Here is the precise description of the situation. Let $c$ be the length of the columns of $\mathbf A$ (and of $\mathbf b$) and let $r\leq c$ be the rank of $\mathbf A$, which is the dimension of the image $I$ of the map $\mathbf x\mapsto\mathbf A\mathbf x$. By the rank-nullity theorem (and the fact that the rank of $\mathbf A^T$ is also $r$), the kernel $K$ of $\mathbf y\mapsto\mathbf A^t\mathbf y$ has co-dimension $r$ in $\mathbb R^c$ (and so dimension $c-r$). A crucial observation is that the subspaces $I,K$ of $\mathbb R^c$ form a direct sum (and $I\oplus K=\mathbb R^c$, due to the dimensions just mentioned): to be in $I$ means being a linear combination of the columns of $A$, and to be in $K$ means being orthogonal to all columns of $A$, and the two are compatible only for the zero vector. Therefore $(\mathbf b+K)\cap I$ consists of a single vector $\mathbf b'\in\mathbb R^c$, and the solutions of $\mathbf A^T \mathbf A \mathbf x = \mathbf A^T\mathbf b$ are those of $\mathbf A \mathbf x = \mathbf b'$.

Now if $\mathbf b=\mathbf b'$ this shows that the two equations of the question are equivalent, and this situation arises if and only if $\mathbf b\in I$. However if $\mathbf b\neq\mathbf b'$, so $\mathbf b\notin I$ and $\mathbf A \mathbf x = \mathbf b$ has no solution, then $\mathbf A \mathbf x = \mathbf b'$ still has solutions because $\mathbf b'\in I$ by construction. If $d$ is the length of the rows of $A$, the dimension of the set of solutions of $\mathbf A^T \mathbf A \mathbf x = \mathbf A^T\mathbf b$ is $d-r$ in all cases.