So I understand the proofs behind Singular Value Decomposition but I'm having trouble interpreting it in the context of a real world problem.
Specifically, If I'm given an $m\times n$ data matrix A, where we have m training examples and n features collected for each example, I'm having trouble understanding the meaning behind Av$_j$ = $\sigma$$_j$u$_j$ where L$_A$ (left multiplication by A) is our linear transformation and $\beta$ = {v$_1$, v$_2$, ... , v$_n$} is an orthonormal basis for F$^n$ and $\gamma$ = {u$_1$, u$_2$, ... , u$_m$} is an orthonormal basis for F$^m$.
From reading various posts and articles, the idea seems to be a larger $\sigma$$_j$ indicates more variation in the data along that vector u$_j$ while a smaller variation in a certain direction u$_j$ is captured with a smaller $\sigma$$_j$.
However, when we are looking at Av$_j$ = $\sigma$$_j$u$_j$ i'm not sure why we care about what L$_A$ is doing. After all, this relationship would be great if I wanted to see what L$_A$ does when it acts on a orthonormal basis $\beta$ but A is just a data matrix so i'm not sure how to interpret the range of a data matrix or what types of transformations A is making when presented a vector x to 'do' left multiplication on.