1
$\begingroup$

I am looking for one or more algorithms that cluster points in non-euclidian vector space.

My axes, specifically, are X and Y in space and Z in time.

I was thinking about first clustering in X and Y only, then going ahead and internally clustering these by time.

I only know of graph-clustering algorithms so I'd be glad if anyone can give me some pointers.

  • 0
    I mean that the distance between points is not defined. e.g. the distance between two points in space is |a-b| = sqrt(x*x+y*y) where x=a_x-b_x and y=a_y-b_y, the distance in time is simply |a-b| measured in a timespan. But you can't put a distance to a point in space *and* time at the same time. At least not that I knew of. (well I think that logically you can, but I dunno if it makes sense)2012-12-09

1 Answers 1

0

The k-means++ algorithm is a great method of clustering that is relatively quick. It isn't guarenteed to give you the best possible results, but if you are looking for a quick method to get 80%-90% of the solution then I would highly reccomend it.

Below is a high level description of the document, but if you need the details, you might try the wikipedia page here.

Before you start you might want to consider trying to scale the data points so that one dimension does not dominate. Another parameter you will have to initially determine is how many clusters you want. I usually just do a parametric sweep on the number of clusters.

Initially, the first cluster center is chosen at random from the data points. The next cluster center is chosen in such a way that the points that are farthest from any cluster are more likely to be chosen. This continues until all the clusters have an initial center.

Now that the initial clusters have been chosen, assign each data point to the nearest cluster. Then update the cluster by setting it to the mean of all the data points associated with it. Repeat the process of determining which cluster each point is associated, and updating the mean until the location of the clusters do not change.