10
$\begingroup$

For positive semi-definite matrices $A,B$, how can I find an $X$ that minimizes $\text{Trace}(AX^TBX$) under 'either' one of these constraints:

a) Sum of squares of Euclidean-distances between pairs of rows in $X$ is a constant $\nu$.

or

b) $X$ is orthogonal.

Not a hw problem- though the style of writing might suggest so. All the matrices have real entries and $A,B$ are square while $X$ is rectangular. Thanks.

This is what I have:

Define $B=F^{T}F$. Define $Y=FX$. You get the above problem as \begin{align} \min_{Y}~ \text{trace}(AY^{T}Y) \end{align}

But now I want $X^*$ that minimizes the original problem. This is what is confusing me!

  • 1
    Four(!!!!) exclamation points!!!! Why???? :-)2012-11-17
  • 0
    Excited after a pompous lunch on a Saturday- ;) Just earnest and not to be unprofessional on this respectable yet fun site ;) !!!2012-11-17
  • 0
    Edited the question to add what I have- and where am stuck.2012-11-17
  • 0
    I don't think you can absorb $F$ into $X$, because $F$ is not necessarily orthogonal.2012-11-18
  • 1
    If you are talking about minimizing $\mathrm{trace}(AX^TBX)$ subject to "(a) or (b)", this seems to be a nonsmooth problem that is very hard to solve. Howver, if you want to do minimization separately in each of the two constrained cases, then situation (b) more or less follows the conventional line of attack, i.e. orthogonally diagonalize $A$ and $B$ first, and absorb the orthogonal matrices into $X$. The minimizer $X$ is then some permutation matrix that minimizes the dot product between the permuted diagonals of $A$ and $B$.2012-11-18
  • 0
    Can you answer with the mathematical notation for minimizing under the situation (b)? . I get a feel for the direction you have mentioned but reading the text is making it confusing. Thanks!2012-11-18
  • 0
    also- any other constraint that you would suggest- so that the solution for $X$ is constrained to be non-trivial (other than zero matrix). This might be a way to have a simpler minimization problem?2012-11-18
  • 0
    @qlinck: I have no idea how to tackle (a). In fact, I doubt if $\mu$ is an arbitrary constant, we can really find a matrix $X$ that satisfies (a).2012-11-19

2 Answers 2

12

If you want to solve the minimization problem individually in each of the two constrained cases, then (b) can be reduced to a long solved problem. [Edit: by "(b) $X$ is orthogonal", I suppose you mean $X$ has orthonormal columns. If the columns of $X$ are merely required to be orthogonal instead of orthonormal, then the minimum trace is clearly $0$, which is attained when $X=0$.]

Suppose $A$ is $m\times m$ and $B$ is $n\times n$ (hence $X$ is $n\times m$). We may assume that $A,B$ and $X$ have the same size, because if $X$ is "wide" ($n$$X=(I_n,\,0_{n\times(m-n)})Q.$$ A similar transformation can be performed if $X$ is "tall" and the size of $B$ is larger than the size of $A$.

Next, as $A$ and $B$ are positive semidefinite, we may orthogonally diagonalize them to their eigenvalue matrices. So, let $A=U\Lambda U^T$ and $B=V\Sigma V^T$, where the eigenvalues (i.e diagonal entries) of $\Lambda$ are arranged in ascending order and those of $B$ are arranged in descending order: \begin{align} \Lambda&=\mathrm{diag}(\lambda_1,\ldots,\lambda_n):=\mathrm{diag}(\lambda^\uparrow_1(A),\ldots,\lambda^\uparrow_n(A)),\\ \Sigma&=\mathrm{diag}(\sigma_1,\ldots,\sigma_n):=\mathrm{diag}(\lambda^\downarrow_1(B),\ldots,\lambda^\downarrow_n(B)). \end{align} Put $\tilde{Q}=V^TQU$, we get $$\min\mathrm{tr}(AQ^T\tilde{B}Q)=\min\mathrm{tr}(\Lambda\tilde{Q}^T\tilde{\Sigma}\tilde{Q}).$$ In other words, by absorbing $U$ and $V$ into $Q$, we may further assume that $A$ and $B$ are nonnegative diagonal matrices.

We have transformed our problem into a nicer form. It is now time to solve it. There are two more popular approaches to solve this problem. One looks more elegant and the other has a wider applicability.

The more elegant approach makes use of Birkhoff's Theorem. First, let $\tilde{Q}=(q_{ij})$. Then $\tilde{Q}\circ \tilde{Q}=(q_{ij}^2)$ is a doubly stochastic matrix. Therefore \begin{align} \min_{\tilde{Q}\tilde{Q}^T=I} \mathrm{tr}(\Lambda \tilde{Q}^T\Sigma\tilde{Q}) &=\min_{\tilde{Q}\tilde{Q}^T=I} \sum_{i,j}\sigma_i\lambda_jq_{ij}^2\\ &\ge\min_S \sum_{i,j}\sigma_i\lambda_js_{ij};\ S=(s_{ij}) \textrm{ is doubly stochastic}\,\ldots(\ast). \end{align} Observe that $\sum_{i,j}\sigma_i\lambda_js_{ij}$ is a linear function in $S$ and the set of all doubly stochastic matrices is compact and convex. Hence $\min\limits_S \sum_{i,j}\sigma_i\lambda_js_{ij}$ must occur at an extreme point of this convex set. However, by Birkhoff's Theorem, the extreme points of this convex set are exactly all the permutation matrices. And a permutation matrix, of course, is real orthogonal. Therefore equality holds in $(\ast)$ above and $\min\limits_{\tilde{Q}\tilde{Q}^T=I} \mathrm{tr}(\Lambda \tilde{Q}^T\Sigma\tilde{Q})$ is the minimum of $\mathrm{tr}(\Lambda P^T\Sigma P)$ over all permutation matrices $P$. As the diagonal entries of $\Lambda$ and $\Sigma$ are nonnegative and arranged in opposite orders, the global minimum occurs at $\tilde{Q}=P=I$, or $Q=VU^T$, which translates to $X=(I_n,\,0_{n\times(m-n)})VU^T$ when $X$ is wide.

The seemingly less elegant approach makes use of calculus. The major steps are as follows:

  1. Set first derivative of $\mathrm{tr}\ \Lambda \tilde{Q}^T \Sigma \tilde{Q}$ (with respect to $\tilde{Q}$, subject to $\tilde{Q}^T\tilde{Q}=I$) to zero gives $\mathrm{tr}(-\Lambda \tilde{Q}^TK\Sigma \tilde{Q} + \Lambda \tilde{Q}^T \Sigma K\tilde{Q})=0$ for all skew-symmetric matrix $K$. Hence $M=-\Sigma \tilde{Q}\Lambda \tilde{Q}^T + \tilde{Q}\Lambda \tilde{Q}^T \Sigma$ is a symmetric matrix.
  2. By definition, however, $M$ is skew-symmetric. Hence $M=0$, i.e. $\tilde{Q}\Lambda \tilde{Q}^T\Sigma$ is symmetric.
  3. Now, suppose the $2n$ diagonal entries in $\Lambda$ and $\Sigma$ are all nonzero and distinct. Since $\tilde{Q}\Lambda \tilde{Q}^T\Sigma$ is symmetric, $\tilde{Q}$ must be a permutation matrix. Hence we obtain the same result as in the previous approach. When the entries of $\Lambda$ and $\Sigma$ are not all distinct, we can consider two sequences of nonnegative diagonal matrices $\{\Lambda_n\}$ and $\{\Sigma_n\}$ such that $\Lambda_n\rightarrow\Lambda,\ \Sigma_n\rightarrow\Sigma$ as $n\rightarrow\infty$, and for each $n$ the $2n$ diagonal entries of $\Lambda_n$ and $\Sigma_n$ are nonzero and distinct. So, we may arrive at our desired conclusion by a continuity argument.

This second approach requires more work (because we have no nice theorem to depend on), but I think the technique involved (matrix calculus) is applicable to more problems and hence more useful.

  • 0
    What is $Y$ in here? Also, if you meant that $Y=FX$, then the greater issue here is that $F$ is singular!2012-11-19
  • 0
    No, $Y^T$ are some mutually orthogonal columns that are orthogonal to the columns of $X^T$ too.2012-11-19
  • 0
    Ok. Am in the office unfortunately- but will definitely get back with any questions or accept this towards the evening. Will do a read and think about it meanwhile.2012-11-19
  • 0
    What is $S_m$ and $\sigma(i)$ ? Also, how do we find a $Y$ procedurally so that the minimization problem can be expressed as suggested-as $Q$ depends on the chosen $Y$?2012-11-19
  • 0
    $S_m$ is the symmetric group of order $m$, i.e. the set of all permutations of $\{1,2,\ldots,m\}$; $\sigma(i)$ is the permuted value in the $i$-th place. See [Wikipedia](http://en.wikipedia.org/wiki/Permutation#Notation), for example. And we don't need to compute $Y$. It is just a device for transforming the minimization problem over rectangular matrices to one over square matrices.2012-11-19
  • 0
    How would we find the $X^*$ that minimizes the original problem towards the end, as in a arg min? That is my original question! Let me know if you would like to continue in chat.2012-11-19
  • 0
    this is very clear. will think over it during the day and get back later- if I find any concerns or will accept! thanks for the time being !2012-11-19
  • 0
    Does the closed form solution give an optimal solution- as I was wondering that the solution for your proposed formulation is a minimizer of $\sum_{i,j}\Lambda_i\Sigma_j\tilde{Q}^2$ under some other constraint (which I was wondering- what it should be) to prevent a trivial zero matrix solution. Can you explain the last step in little detail for me where you give a closed form solution for $X$? Or some pointers..as I may be missing something thanks! –2012-11-19
  • 0
    Are you saying the minimum of $\{\sum_{i=1}^m\lambda_i(A)\lambda_{\sigma(i)}(\tilde{B}): \sigma\in S_m\}$ is attained by $\sum_{i=1}^m\lambda_i(A)\lambda_i(\tilde{B})$ when $\lambda_i(A)$ and $\lambda_i(\tilde B)$ are arranged in the same order? But if $\lambda_1(A) = \lambda_1(\tilde B) = 1$ and $\lambda_2(A) = \lambda_2(\tilde B) = 2$, then $1\cdot1 + 2\cdot2$ is the maximum, not the minimum.2012-11-20
  • 0
    No, the eigenvalues of $\tilde{B}$ shall be arranged in reverse order. I should have corrected it, but curiously the edit disappeared into thin air in my subsequent edits.2012-11-20
  • 0
    @user1551 just read the proof, I really liked the bit with birkhoff's theorem.2012-12-15
  • 1
    @dineshdileep Me too. When I first read that proof with Birkhoff's theorem, I shouted "wow!" in my heart.2012-12-15
1

As I read the question some papers I have recently studied came up to my mind as a quick recommendation for an orthogonal solution to the problem. In the papers "Dynamical systems that sort lists, diagonalize matrices, and solve linear programming problems" and "Least squares matching problems" a dynamical systems approach that finds a solution in orthogonal space is presented. However, in Brockett's papers A and B matrices should have distinct eigenvalues which affects the number of local minima and X is a square matrix. Still they may be inspiring.