2
$\begingroup$

I am trying to find a solution for the following equation:

$$\underset{X}{\text{argmin}} \sum_{i=1}^{N} (y_i - \mathbf{a_{i}^TX^T b_i})^2$$

Where $\mathbf{a_i}$ and $\mathbf{b_i}$ are vectors and $\mathbf{X}$ is a matrix.

I have worked out the derivative:

$$\begin{align} \nabla_X L(\mathbf{X}) &= \nabla_X \sum_{i=1}^{N} (y_i - \mathbf{a_{i}^TX^T b_i})^2 \\ &= \sum_{i=1}^{N} 2(y_i - \mathbf{a_{i}^TX^T b_i})\nabla_X(y_i - \mathbf{a_{i}^T X^T b_i}) \\ &= \sum_{i=1}^{N} -2(y_i - \mathbf{a_{i}^TX^T b_i})\mathbf{b_i a_i^T} \end{align}$$

When I set the derivative to $0$ I am not sure how to derive the solution for $\mathbf{X}$:

$$\begin{align} \sum_{i=1}^{N} -2(y_i &- \mathbf{a_{i}^T X^T b_i})\mathbf{b_i a_i^T} = \mathbf{0} \\ \sum_{i=1}^{N} y_i\mathbf{b_i a_i^T} &= \sum_{i=1}^{N}(\mathbf{a_{i}^T X^T b_i})\mathbf{b_i a_i^T} \\ \mathbf{X} &= \;? \end{align}$$

  • 0
    Assuming $a_I,b_i \in \mathbb R^n$ and $X \in \mathbb R^{n\times n}$ and $y_i \in $R$ the you now arrived at *one* equation which is linear in every $x_{ij}$ with $n^2$ unknowns, is that correct?2017-01-27
  • 0
    X is not square.2017-01-30

3 Answers 3

3

$$\sum_{i=1}^n \left( \mathrm b_i^{\top} \mathrm X \,\mathrm a_i - y_i \right)^2 = \sum_{i=1}^n \left( \mathrm b_i^{\top} \mathrm X \,\mathrm a_i \mathrm a_i^{\top} \mathrm X^{\top} \mathrm b_i - 2 y_i \mathrm b_i^{\top} \mathrm X \,\mathrm a_i + y_i^2\right)$$

Differentiating this cost function with respect to $\mathrm X$ and finding where the derivative vanishes, we obtain the following linear matrix equation

$$\sum_{i=1}^n \mathrm b_i \left( \mathrm b_i^{\top} \mathrm X \,\mathrm a_i - y_i \right) \mathrm a_i^{\top} = \mathrm O$$

which can be rewritten as follows

$$\sum_{i=1}^n \left( \mathrm b_i \mathrm b_i^{\top} \right) \mathrm X \left( \mathrm a_i \mathrm a_i^{\top} \right) = \sum_{i=1}^n y_i \mathrm b_i \mathrm a_i^{\top}$$

Vectorizing, we obtain the following linear system

$$\left( \sum_{i=1}^n \left(\mathrm a_i \mathrm a_i^{\top}\right) \otimes \left(\mathrm b_i \mathrm b_i^{\top}\right) \right) \, \mbox{vec} (\mathrm X) = \sum_{i=1}^n (\mathrm a_i \otimes \mathrm b_i) \, y_i$$

  • 0
    Great answer! For the vectorization step, I follow that you used this identity (encasulating sum from pg 30 of the matrix cookbook): $\sum A_n X B_n = C$ and $vec(C) = (\sum_n B_n^T \otimes A_n) vec(X)$. Should the right hand side be $vec(\sum a \otimes b)y = \sum vec(a \otimes b)y$ instead... or am I mistaken?2017-01-27
  • 0
    @anthonybell $$\mbox{vec} (\mathrm b_i \mathrm a_i^{\top}) = \mathrm a_i \otimes \mathrm b_i$$2017-01-27
  • 0
    thanks! I was thinking $ba^T = b \otimes a$ for some reason..2017-01-29
1

You're almost there. If you put things into block matrices, then you'll have: $$y^T B^TA = (A^TXB)(B^TA)$$ Now multiply by the pseudoinverses, and you have $$ (AA^T)^{-1}A \left[y^TB^TA\right]A^TBB^T(BB^TA A^TBB^T)^{-1} = (AA^T)^{-1}A \left[A^T X B B^TA \right]A^TBB^T(BB^TA A^TBB^T)^{-1} = X $$

1

You can use vectorization to rewrite the loss function as $$\eqalign { L &= \sum_k \Big(b_k^TXa_k -y_k\Big)^2 \cr &= \sum_k \Big((a_k\otimes b_k)^T{\rm vec}(X) -y_k\Big)^2 \cr &= \sum_k \Big(c_k^Tx -y_k\Big)^2 \cr }$$ where $x={\rm vec}(X)$ and $c_k=a_k\otimes b_k$ are used for convenience.

Then you can drop the explicit summation notation and write the function in terms of the matrix $C$ (whose columns are the $\{c_k\}$ vectors) and the Frobenius norm. Or better yet, use the Frobenius (:) product. $$\eqalign { L &= \|C^Tx-y\|_F^2 \cr &= (C^Tx-y):(C^Tx-y) \cr }$$ In this form, finding the gradient is easy $$\eqalign { dL &= 2\,(C^Tx-y):C^T\,dx \cr &= 2\,C(C^Tx-y):dx \cr\cr \frac{\partial L}{\partial x} &= 2\,C(C^Tx-y) \cr }$$ Setting the gradient to zero and solving $$\eqalign { CC^Tx &= Cy \cr x &= (CC^T)^{-1}Cy \cr &= \operatorname{pinv}(C^T)\,y \cr }$$

  • 0
    After reformulating the problem, you should realize that you're looking for the least squares solution of a linear system $$C^Tx=y$$ and jump directly to the answer $$x={\rm pinv}(C^T)\,y$$ by-passing all of the gradient boilerplate.2017-01-28
  • 0
    @frank is right, but I really love this solution since it reformulates the problem as a least squares problem up front, so I don't have to re-derive the problem if I want to add regularization like L1 or L2 regulization.2017-01-29