7
$\begingroup$

Let $A, X\in\mathbb{R}^{n\times n}$. The scalar objective function is $J=\mathrm{tr}(AX)$ If no constraints, let the derivative of $J$ with respect to $X$ be zeros, then we have $A=0$ Suppose $A$ is also a complex function, from $A=0$ I can further calculate something.

My question is: what if $X$ is an orthogonal matrix, i.e., $X^TX=I$? Then it becomes an constrained optimization problem. Can I still use matrix differential techniques to derive the minimizer? Thanks.

2 Answers 2

6

The method of Lagrange multipliers yields the solution.

One studies the (real-valued) function $F(X)$ with the (matrix) constraint that $C(X)=I$ for $F(X)=\text{trace}(AX)$ and $C(X)=X^TX$, hence $ F(X)=\sum\limits_{ij}A_{ji}X_{ij},\quad C_{ij}(X)=\sum\limits_{k}X_{ki}X_{kj}. $ The gradients of $F$ and $C$ are given by $\partial_{ij}F(X)=A_{ji}$ and $ \partial_{ij}C_{k\ell}(X)=X_{i\ell}[k=j]+X_{ik}[\ell=j]. $ One wants that there exists some multipliers $\lambda_{ij}$ such that, for every $i$ and $j$, $ \partial_{ij}F(X)=\sum\limits_{k\ell}\lambda_{k\ell}\partial_{ij}C_{k\ell}(X). $ This condition reads $ A_{ji}=\sum\limits_{k,\ell}\lambda_{k\ell}\left(X_{i\ell}[k=j]+X_{ik}[\ell=j]\right)=\sum\limits_{\ell}\lambda_{j\ell}X_{i\ell}+\sum\limits_{k}\lambda_{kj}X_{ik}, $ or, equivalently, introducing the matrix $\Lambda=(\lambda_{ij})$, $ A^T=X\Lambda^T+X\Lambda. $ The matrix $M=\Lambda^T+\Lambda$ is such that $M^T=M$ and $XM=A^T$, hence $X^TX=I$ implies that $M$ should be such that $M^TM=M^2=AA^T$.

Using pseudo-inverse matrices if $A$ is not invertible and the usual definition of the square root of a symmetric matrix, one sees that the maximizing and minimizing matrices $X$ and the maximum and minimum values of $F$ are

When $A$ is invertible, one can use the usual definition of the square root of a symmetric matrix to find $M$. The maximizing and minimizing matrices $X$ and the maximum and minimum values of $F$ are$ X_0=A^TM^{-1}=\pm A^T(AA^T)^{-1/2},\quad F(X_0)=\pm\mbox{trace}((AA^T)^{1/2}). $ When $A$ is not invertible, the formula $X_0=A^TM^{-1}$, whichever meaning one gives to the notation $M^{-1}$, cannot yield an orthogonal matrix $X_0$ since, if $A$ is not invertible, neither are $A^T$, nor $A^T$ times any matrix. On the other hand, the formula $F(X_0)=\pm\mbox{trace}((AA^T)^{1/2})$ might still be valid.

  • 0
    @Did and joriki: The polar decomposition theorem for $A$ basically states that you can always factorize $A =RU$ where $U$ is the square root of $A^TA$ and $R$ is orthogonal. If I remember right, the invertibility of $A$ gives you unique orthogonal $R$. Thus, in the given problem what it means is that the extremizer $Q$ is not unique, but the value of the extremum is the same.2017-11-05
2

Another way to get did's answer is to use the SVD to reduce to the case of $A$ being diagonal: We have $A = USV'$, where $U$ and $V$ are orthogonal and $S$ is diagonal, and so

$\def\Tr{\operatorname{Tr}}\Tr(AX) = \Tr(SV'XU)\;.$

Since $X\to V'XU$ is a bijection on the set of orthogonal matrices,

$\min\{\Tr(AX)\mid X\text{ orthogonal}\} = \min\{\Tr(SW)\mid W\text{ orthogonal}\}\;.$

But $\Tr(SW) = \sum_iS_{ii}W_{ii}$, and all the $S_{ii}$ are non-negative and $-1\le W_{ii}\le1$, so that $\Tr(SW)\ge\Tr(S(-I))$, and so the minimum occurs at $W=-I$ (or $X=-VU'$) and is $-\sum_iS_{ii}$.

  • 0
    I formatted your answer because I thought it's important to have this solution worked out here; but it was quite a bit of work; I encourage you to take a look at the [formatting FAQ](http://meta.math.stackexchange.com/questions/5020/mathjax-basic-tutorial-and-quick-reference); it's not as hard as it might seem.2012-10-30