1
$\begingroup$

By Cover's Theorem(1965), it is possible to make patterns separable if the original feature space is transformed to a higher-dimensional space.

Think of the XOR problem.

enter image description here

It is not possible to separate the two classes with a hyperplane. One trick is to use Neural Networks, which transformed the feature space linearly to a higher dimensional space. Another is the kernel trick with non-linear transformation.

But can we make the classes separable only using a low-dimensional linear transformation, i.e. $R^2 \rightarrow R^2$ Linear Transformation? I think I cannot find one. Is there any way to prove that such a transformation does not exist?

  • 1
    Neural networks aren't linear either; the activation function is typically sigmoidal. In any case, as pzf's answer shows, if neural networks were linear, they wouldn't be able to solve the XOR problem no matter how high-dimensional you made the feature space.2012-06-26
  • 0
    Replacing feature `x2` with a single derived column defined as: (`x1` xor `x2`) would be trivially linearly separable by a hyperplane linear regression or perceptron. It depends if you count xor as a linear transformation or not.2017-05-02

1 Answers 1

1

Two sets of vectors $x_i$ and $y_i$ are linearly separable if there is some vector $w$ and a constant $k$ such that $w\cdot x_i \ge k$ and $w\cdot y_i < k$. You are asking if there is some linear (or affine, doesn't matter) transformation that could make two sets that are not linearly separable linearly separable.

Suppose $M$ is such a transformation. Note that $w\cdot (M x_i)=(M^T w)\cdot x_i$, where $M^T$ is the adjoint/transpose of $M$. Then clearly if $M$ allows you to separate $Mx_i$ and $My_i$ with vector $w$, the original sets were separable using $M^T w$. No linear transformation of that sort will work. Alternatively, hyperplanes will still be hyperplanes under linear transformations (of full rank).