7
$\begingroup$

Let $\{X_1,\dots,X_K\}$ is a set of random matrices, where $X_k\in\mathbb{R}^{M\times N}, k=1,\dots,K$, and $U\in\mathbb{R}^{M\times r}$ and $V\in\mathbb{R}^{N\times r}$ are two matrices containing orthogonal columns (i.e., $U^\top U =I, V^\top V =I$). I was wondering, if the following question has a analytical solution:

$$\displaystyle\max_{U,V} \sum_{k=1}^K \|U^\top X_k V\|_F^2$$

If not, how should I solve it? Alternating optimization?

(At first, I thought it may be related to the SVD of the sum of the matrices $\{X_k\}$, but so far I have no hint to prove it.)

  • 0
    @Rodrigo de Azevedo Corrected. Sorry for the confusion.2017-02-26
  • 0
    @Rodrigo de Azevedo Thanks for the comments. Any idea for this question?2017-02-26
  • 0
    @Rodrigo de Azevedo I think so... If I remove $U$ or $V$, this would be the objective function of 2D-PCA.2017-02-26
  • 0
    Do you see this as being a question about statistics or machine learning? It looks like a pure math question to me. We can migrate this to the [math.SE] SE site for you.2017-03-23

2 Answers 2

1

$$\begin{array}{ll} \text{maximize} & \displaystyle\sum_{k=1}^{K} \| \mathrm U^{\top} \mathrm X_k \mathrm V \|_{\text F}^2\\ \text{subject to} & \mathrm U^{\top} \mathrm U = \mathrm I\\ & \mathrm V^{\top} \mathrm V = \mathrm I\end{array}$$

Let

$$f_k (\mathrm U, \mathrm V) := \| \mathrm U^{\top} \mathrm X_k \mathrm V \|_{\text F}^2 = \mbox{tr} \left( \mathrm U^{\top} \mathrm X_k \mathrm V \mathrm V^{\top} \mathrm X_k^{\top} \mathrm U \right) = \mbox{tr} \left( \mathrm V^{\top} \mathrm X_k^{\top} \mathrm U \mathrm U^{\top} \mathrm X_k \mathrm V\right)$$

Hence,

$$\partial_{\mathrm U} \, f_k (\mathrm U, \mathrm V) = 2 \,\mathrm X_k \mathrm V \mathrm V^{\top} \mathrm X_k^{\top} \mathrm U$$

$$\partial_{\mathrm V} \, f_k (\mathrm U, \mathrm V) = 2 \,\mathrm X_k^{\top} \mathrm U \mathrm U^{\top} \mathrm X_k \mathrm V$$

Let the Lagrangian be

$$\mathcal L (\mathrm U, \mathrm V, \Lambda_1, \Lambda_2) := \sum_{k=1}^{K} f_k (\mathrm U, \mathrm V) - \langle \Lambda_1, \mathrm U^{\top} \mathrm U - \mathrm I \rangle - \langle \Lambda_2, \mathrm V^{\top} \mathrm V - \mathrm I \rangle$$

where the Lagrange multipliers $\Lambda_1$ and $\Lambda_2$ are symmetric matrices. Taking the partial derivatives with respect to $\mathrm U$ and $\mathrm V$,

$$\partial_{\mathrm U} \mathcal L (\mathrm U, \mathrm V, \Lambda_1, \Lambda_2) = 2 \sum_{k=1}^{K} \mathrm X_k \mathrm V \mathrm V^{\top} \mathrm X_k^{\top} \mathrm U - 2 \mathrm U \Lambda_1$$

$$\partial_{\mathrm V} \mathcal L (\mathrm U, \mathrm V, \Lambda_1, \Lambda_2) = 2 \sum_{k=1}^{K} \mathrm X_k^{\top} \mathrm U \mathrm U^{\top} \mathrm X_k \mathrm V - 2 \mathrm V \Lambda_2$$

Finding where the partial derivatives vanish, we obtain two cubic matrix equations in $\mathrm U, \mathrm V, \Lambda_1, \Lambda_2$ and two quadratic matrix equations in $\mathrm U$ and $\mathrm V$

$$\boxed{\begin{array}{rl} \displaystyle\sum_{k=1}^{K} \mathrm X_k \mathrm V \mathrm V^{\top} \mathrm X_k^{\top} \mathrm U &= \mathrm U \Lambda_1\\ \displaystyle\sum_{k=1}^{K} \mathrm X_k^{\top} \mathrm U \mathrm U^{\top} \mathrm X_k \mathrm V &= \mathrm V \Lambda_2\\ \mathrm U^{\top} \mathrm U &= \mathrm I\\ \mathrm V^{\top} \mathrm V &= \mathrm I \end{array}}$$

How can one solve these matrix equations? I do not know.

0

Define the variables $$\eqalign{ Y&=UU^T &\implies y={\rm vec}(Y) \cr Z&=VV^T &\implies z={\rm vec}(Z) \cr S&=\sum_kX_k\otimes X_k }$$ Note that $(Y,Z)$ are not orthogonal but they are ortho-projectors $$\eqalign{ Y^2 &= Y = Y^T \cr Z^2 &= Z = Z^T \cr }$$ Similarly, the vectors $(y,z)$ are not unit vectors but they satisfy simple normalization conditions $$\eqalign{ y^Ty &= M \cr z^Tz &= N \cr }$$ Write the objective function as $$\eqalign{ f &= \sum_k \,\|U^TX_kV\|^2_F \cr &= \sum_k (U^TX_kV):(U^TX_kV) \cr &= \sum_k (YX_k):(X_kZ) \cr &= \sum_k {\rm vec}(YX_k)^T {\rm vec}(X_kZ) \cr &= \sum_k y^T(X_k\otimes I_M)(I_N\otimes X_k)z \cr &= y^TSz \cr }$$ where colon denotes the inner/Frobenius product.

The main result of these manipulations was to recast the problem as a single matrix-vector equation. The vector gradients are then $$\eqalign{ \frac{\partial f}{\partial y} &= Sz,\,\,\,\,\,\frac{\partial f}{\partial z} &= S^Ty \cr\cr }$$ This still isn't a solution to the problem, just another way of thinking about it.