1
$\begingroup$

This is my reference: http://stevehanov.ca/blog/index.php?id=130

A vantage-point tree is a way of organizing a set of points so that finding the n-nearest neighbors is as efficient as possible. It builds this tree by picking a point from a set of points (called the vantage point). The median distance from that point to all other points in space is sort of the boundary that splits off that set into two equally sized subsets. This is repeated recursively for each subset that was made until no more points are left (when all points have been made vantage points).

Now searching a VP-tree is where I need help understanding. Basically, when a search is done, a new point is plotted into that space of points, and you need to find the n-nearest neighbors to that point. I especially have a problem with the "tau." and its purpose in this algorithm.

2 Answers 2

1

I think what may be confusing you is that he talks about "the farthest point that we have found" but actually means the farthest point of the up to $n$ candidates we've collected so far. Once we've collected $n$ candidates, we can prune all parts of the tree that contain only points worse than the worst of those candidates. To do that, we need to keep track of the distance to our worst candidate so far. That's $\tau$.

  • 0
    So suppose I choose the right child over the left child, and my tau is distance from the target to the right child. If tau is greater than the distance between the target and the shell of the left child, then I would also search the left child?2012-10-29
-1

You always search first the subtree where the query point belongs to, on your way down the tree, until you reach a leaf. Every time you return from the search, if the distance to the n-th neighbor found so far intersects the boundary to the other subtree, then you have to search the other subtree as well.

The idea is that the distance to the n-th neighbor will be shrinking as you go down so that you won't have to search within the other subtree (especially the subtrees closer to the root of the tree where each subtree corresponds to a lot of points excluded from the search).

Persuade yourself by drawing a rectangle, as your corpus, and a vantage point as the center of a disk denoting the median distance.

There are two cases for the query point: to fall into the disk or out of it. And two subcases for each case: the disk centered at the query point denoting the distance to its n-th neighbor to remain within the same subspace where the query point is, or to intersect the other subspace as well.