1
$\begingroup$

Suppose a $nxn$ matrix A is diagonalizable with an orthonormal basis of eigenvectors, $\beta$ = { v$_1$, v$_2$, v$_3$, ... v$_n$ } and corresponding eigenvalues $\lambda$$_1$ $\geq$ $\lambda$$_2$ $\geq$ ... $\geq$ $\lambda$$_n$. In addition, A is just a matrix of data, where we have for each row a data sample with n features (columns = features of the sample).

Then if i wanted to approximate this matrix with a lower rank matrix would the following make sense?

First we know that the columns of A are in the range of L$_A$ when we think of A as a linear transformation on some vector. (Where L$_A$ is left multiplication by A).

Since any y $\in$ range of L$_A$ is expressed by L$_A$$(x)$ for x$\in$F$^n$ and we know we can express x as a linear combination of the eigenvectors in $\beta$ since they form a basis.Then suppose x = a$_1$v$_1$ + a$_2$v$_2$ + ... a$_n$v$_n$, for some scalers a$_i$. Then since L$_A$ is linear we can write L$_A$$(x)$ as

L$_A$(x)= a$_1$L$_A$(v$_1$) + a$_2$L$_A$(v$_2$) + ... a$_n$L$_A$(v$_n$)

I claim that since $\beta$ is made up of orthonormal eigenvectors, it would be very easy to find an x in the codomain such that L$_A$$(x)$ = some column in our data matrix. (using inner product results for orthonormal vectors to find scalers)

Then we know that most likely our vector L$_A$(x) is likely to end up somewhere close to the sum of the first few terms of the above where the eigenvectors v$_1$,v$_2$,...,v$_i$ have large eigenvalues since they will scale our v$_i$ by the largest amount, and the smaller eigenvalues will have less of an effect on where the final vector L$_A$$(x)$ ends up.

So it would seem that for each column back in our original data matrix A we could give a good approximation of the column with just the first d eigenvectors where d < n. Hence we obtain a matrix that has an image of lesser dimension, hence the rank is reduced of our data matrix.

I realize this is essentially why svd or pca is used. would the method above work? would it allow us to drop out any columns? From what I understand that's the goal when reducing data to more manageable sizes.

Thanks!

  • 1
    You have the right idea and this is indeed a special case of SVD. However, if you construct your matrix $A$ from some arbitrary data, there's no guarantee that it will be a square matrix, nor diagonalizable nor orthogonally diagonalizable. The point of SVD is that it works for arbitrary matrices. If the matrix happens to be square and symmetric, then your procedure describes (more or less) how one would use SVD to approximate your matrix with a lower rank matrix.2017-01-07
  • 0
    right right, point taken that this scenario would probably never happen in practice. just looking to better understand eigenvectors and eigenvalues....in the svd case how would we get columns to fall out? would some just end up zero from the above procedure? @levap2017-01-07
  • 1
    Yeah, more or less. You would define a new operator $T$ by the formula $T(x) = a_1 \lambda_1 v_1 + \dots + a_i \lambda_i v_i$ (so that you "throw" away all the eigenvalues $\lambda_{i+1}, \dots, \lambda_n$). The matrix representing $T$ will be "smaller" - it will be $\lambda_1 v_1 v_1^T + \dots \lambda_i v_i v_i^T$ so it will depend only on $v_1, \dots, v_i$ and $\lambda_1, \dots, \lambda_i$ and not on $v_{i+1}, \dots, v_n$ and $\lambda_{i+1}, \dots, \lambda_n$. In the special case where $A$ is diagonal, this will turn the last $n - i$ columns of $A$ to zero.2017-01-07
  • 0
    so i see how T(x) would play the role of L$_A$ where L$_A$(x) outputs a lower rank version of each column in the original data matrix, but then you mention the **matrix** representation of T so if i assume that is B, then i agree it will be smaller since it will have less columns since we have decreased the number of basis (eigen)vectors we care about to i < n. but B then is just a ixi diagonal matrix, with the 'larger' eigenvalues, correct? so then we can apply B to some vector in F$^i$ which looks like: [a$_1$, a$_2$,...a$_i$]^t since we only care about the i dimensions.2017-01-08
  • 0
    then this takes us to F$^i$ as a representation of scalers to our i dimensional basis so to get our final data vector we need to have put these scalers in front of the v$_1$, v$_2$, .. v$_i$ basis vectors and sum them. sound right or wrong? @levap2017-01-08
  • 0
    Sorry for the barrage, but think I was overthinking. The above would give the final data representation but I think a better explanation would be since the unused n-i vectors and their now 0 eigenvalues will not be their to scale those certain directions in the range when we compute the new linear combination, so those new data matrix columns will likely be now zero, or close to it, or those columns would be repeats I suppose, but seems unlikely. After all, you can't have n linearly independent columns in now a matrix where we've established each column is now spanned by dimension i ! @levap2017-01-08
  • 0
    It sounds right. Let me be more precise. You have an $n \times n$ matrix $A$ and you assume that it is orthogonally diagonalizable. This means that $A$ is symmetric so if you want to send someone $A$, you'll need to send $\frac{n(n+1)}{2}$ numbers (the entries $a_{ij}$ of $A$ for $i \leq j$). Now consider the new operator $T$ which is a rank $k$ approximation of $L_A$. To describe it, you need to know only $\lambda_1, \dots, \lambda_i$ and $v_1, \dots, v_i$ which amounts to $n \cdot i + i$ numbers. If $i << n$, this is a huge improvement.2017-01-08

0 Answers 0