0
$\begingroup$

I have hopped over from stackoverflow to get help on something that's more math-ey than I can handle :-)

Is there an algorithm that you can think of that would transform an (exhaustive) list of distances between X points into a graph with a minimum number of edges?

e.g.

    A    B    C    D  A   x    5    6    8  B   5    x    4    1  C   6    4    x    3  D   8    1    3    x 

I could easily just build 3 edges out of each node to create a working graph, but I am looking for an algorithm that reduces the number of edges by using some nodes as intermediaries where possible, instead of a fully interconnected graph.

Thanks and if this is too algorithmic and not mathematical enough, fell free to close the question.

  • 0
    What is the relationship between the distances and the graph? Are the distances supposed to describe the length of the shortest path between the vertices? Are you guaranteed in advance that at least one such graph exists?2012-06-26
  • 0
    You can go from $D$ to $A$ in $8$, but if you to from $D$ to $B$ and then to $A$ you only need $1+5=6$?2012-06-26
  • 0
    The distances describe the levels of 1 to 1 'closeness' as has been calculated by another algorithm. The purpose of the graph would be to extract some additional insight on how the 1-1 closeness may translate to global structure.2012-06-26
  • 0
    Listing, the numbers are made up but something like that may come up as the 1-1 distances are not based on exact measures, but instead are the output of an algorithm that takes many attributes into account and spits out a rating.2012-06-26
  • 1
    In that case it is pretty unnatural to name it distance.2012-06-26
  • 0
    Fair enough... .2012-06-26

1 Answers 1

0

The typical method for handling this situation is by thresholding. In your example, we could create a graph with vertex set $\{A,B,C,D\}$ and put an edge between vertex $X$ and vertex $Y$ if the distance between $X$ and $Y$ is $\geq 6$, say. In this case, the graph would be:

Graph obtained by thresholding

Problems like this arise in the study of real-world networks, where the "distance" may represent a wide variety of things, for example,

  • Euclidean distance between atoms in a protein,
  • signal strength between neurons, or
  • similarity between documents.

(Note: graph theory has another notion of "distance".)

The threshold parameter is somewhat arbitrary, and is usually tweaked until a manageable edge density is achieved (or, more cynically, until the results the researcher wants are achieved).

Caution: there is a danger that this process will oversimplify a problem. However, without thresholding, more sophisticated methods will be needed to study the system (which might be impractical).

  • 0
    Thx Doug... Did not end up using this, but better late than never!2012-12-28