I'm not sure if it's the best place to ask this, as it's completely layman problem, but here we go. I've got a 2D data matrix, with var 1(lets call it rows) and var 2 (columns) and binary values only, where 1's count as valuable and 0's not really. From this data I need to extract the most valuable subset (mostly 1s left), in a way that favors the conservation of the rows. In other words I want to remove some (max 25%) rows and some (up to 75%) columns in a manner that leaves me with best-fitted combination. I was thinking about something like this:
- Calculate mean for every row and every column
- Calculate mean of means for rows and columns
- For every row and column subtract the mean from mean of means
- Plot the distribution
- Remove "the worst" rows and columns
However this seems very crude, and may lead to unnecessary loss of information. So I'd be grateful for any hints.