3
$\begingroup$

Let us have $n$ vectors in $\mathbb{R}^m$ and let $P \in \mathbb{R}^{m\times n}$ be the matrix involving them in columns.

We aim to select $r$ columns and generate the remaining ones as linear combinations of them. Let $S \in \{1,\ldots,n\}$ be the index set showing which columns are selected and $P_S$ be the submatrix involving the selected columns. Then, the approximation will be written as

\begin{align} P\approx P_S R \end{align} where $R\in \mathbb{R}^{r\times n}$.

We aim to determine the set $S^*$ satisfying

\begin{align} \min_{S} \|P - P_SR\|_F^2 \end{align}

How can we determine this set without an exhaustive search over all combinations?

1 Answers 1

1

The number $r$ has a role to play in this (obviously). Let us say rank of your $m\times n$ matrix $A$ is $k$. If $r>k$, the answer is obvious, take ANY $k$ independent columns, add to that some other columns you like to make the total number columns of $r$, and above frobenius norm will be minimized to zero which is its optimal value, and you will have your $P_S$. Now if $kpaper. Ofcourse, this is just one idea that might work intuitively. I can't think any else right now. Any ways I will think about it.

  • 0
    Thank you very much for the valuable comment. I've tried [Randomized PCA](http://stats.stackexchange.com/a/11934) for huge matrices and tried [ColumnSelect algorithm of Mahoney and Drines](http://www.pnas.org/content/106/3/697.abstract) which is a very similar approach to yours. But my concern is more about the problem's nature and I want to understand how difficult it is to find the global solution. I wonder if there exists such a method to find the best solution without exploring all the search space.2012-11-25
  • 0
    If such a method exists, then i believe it should be a greedy algorithm and also the data matrix should have some desirable properties. Because the original problem itself looks NP-hard (am not sure).2012-11-26
  • 0
    I agree, it looks like NP-hard.2012-11-26