I posted this question in stack overflow but no one answer it so I moved it to math overflow...
I am learning theory of machine learning and have some confusion about VC dimensions. According to the text book, the VC dimension of 2D axis-aligned rectangles is 4 which means it cannot shatter 5 points.
I found an example here: Cornell

However I still cannot understand this example. What if we use a rectangle like this (the red one)
Then we can classify this point out of them. Why is this incorrect?
