15
$\begingroup$

it's my first post here, so I apologize if I broke a rule!

I'm reading Introduction to Machine Learning and got stuck on VC dimension. Here's a quote from the book:

"...we see that an axis-aligned rectangle can shatter four points in two dimensions. Then VC(H), when H is the hypothesis class of axis-aligned rectangles in two dimensions, is four. In calculating the VC dimension, it is enough that we find four points that can be shattered; it is not necessary that we be able to shatter any four points..."

enter image description here

And I don't understand that. If it's enough to find some separable combinations, why can't we just choose a "rectangle with positive examples" from the image above, put another $n$ positive ones therein, and then say $VC(H)$ increased by $n$? And if all cases must be separable then why we don't consider 4 points placed on a line - which is in general not possible to shatter by a rectangle?

The same with the linear classifier example on wikipedia VC article - on their image four points are impossible to shatter, but we can come up with a layout where it is possible. And conversely, we can put 3 points (as "+", "-", "+") on a line and it won't be possible to separate the positives from the negatives by a linear classifier.

Can anyone explain where's my mistake?

  • 0
    @Srivatsan, your comment should definitely be posted as an answer :)2014-12-31

4 Answers 4

7

I think you just need to think carefully about how your examples relate to the definition. The VC dimension of H is the maximum h such that there exists a set of cardinality h shattered by H. To show that VC(H)=4 you must show:

  • There is a set of cardinality 4 shattered by H, and
  • Any set of cardinality greater than 4 is not shattered by H

In the picture, they are doing the first thing - giving a lower bound on the VC dimension by giving an example of a set that is shattered. To show VC(H)<5 they should also show that no set of five points is shattered.

There will in general be lots of sets of various sizes that are not shattered, but that doesn't matter, essentially because VC dimension is a maximum over sets that are shattered. Your example of $n$ points on a line does not imply anything about the VC dimension. I hope this helps.

  • 0
    Aha that makes sense - thanks!2012-01-11
0

@Srivatsan's explanation makes sense to me. I had this same problem when I started learning this. You have to allow any possible point dichotomy, and then prove that you can shatter that. Hence, if you find one combination of classes for a number of points that you can't shatter, that machine must have a VC dimension smaller than that number of points. Hence, the fact that a linear classifier can't shatter XOR means that you can't handle anything larger than 3 points, which is an illustration of Burges' n(-dimension)+1 rule for linear classifier VC dimension.

0

The VC dimension of a hypothesis class, H is the cardinality of the largest set which can be shattered by a H .The requirement is you should be able to find at least one such set of points for which you can find a H that shatters (or you can always find a member of H that can classify every possible labelling of a ) particular set.

Now, consider the given set of 3 points (vertices of a triangle). You can find a h that correctly classifies every possible labelling of the set. It doesn't matter you can't do it for another set of 3 points (three points on a line). The sufficient condition is you find at least one such set.

Whereas, sure you could think of a labelling of 4 points (say the points have the same label) that a two dimensional classifier could properly classify, but can it properly classify every possible labelling of the the set of points? Think about it. You'll realise there exists no such pair.

-1

@andreister, just adding some color:

The confusion arises because the definition of VC dimension involves BOTH an existential qualifier ("there exists"), and a universal qualifier ("for all"):

Note that, if the VC dimension is h, then there exists at least one set of h points that can be shattered, but it in general it will not be true that every set of h points can be shattered Burgess (1998)

Thus, one such set of points has to exist, but their shattering == for all labelings == a universal assertion.