By Cover's Theorem(1965), it is possible to make patterns separable if the original feature space is transformed to a higher-dimensional space.
Think of the XOR problem.
It is not possible to separate the two classes with a hyperplane. One trick is to use Neural Networks, which transformed the feature space linearly to a higher dimensional space. Another is the kernel trick with non-linear transformation.
But can we make the classes separable only using a low-dimensional linear transformation, i.e. $R^2 \rightarrow R^2$ Linear Transformation? I think I cannot find one. Is there any way to prove that such a transformation does not exist?