2
$\begingroup$

I have a set of undirected graph networks, 6 nodes each with weighted edges. I would like to compare each with a reference graph network which also has the same 6 nodes but with different weights. What is a good method to compare them and have a similarity score, say between zero to one?

UPDATE: To clarify my problem, I am going to explain what is the problem I am looking at. I have a series of protein structures which are identical in their amino acid sequence (same nodes). However they are slightly different in their spatial orientation. I want to see which amino acids interact together (if two amino acids are within a certain cutoff distance, then it would mean they are interacting -- an edge between the two nodes). They may be so close so the edge weight would be 1 or less until it reaches zero (weights of edges). I want to see which protein structures (graph networks) are similar in the way their amino acids interact?

  • 0
    When you say "the same nodes", do you mean that they have edges in the same place as well? i.e. visually the same network, but with different weights on the edges?2017-01-07
  • 0
    @πr8 Not necessarily the same edges. I just said that because I wanted to emphasize that the nodes represent the same things. However, I would like to know the answer for both cases (with the same edges or without).2017-01-07
  • 3
    Right. I'll say ahead of time that finding "best method" is unlikely to be well-defined, at least without further information. One approach that might be useful is to consider adjacency matrices, and using this to construct some metric.2017-01-07
  • 0
    Alright I changed "best method" to "a good method".2017-01-07
  • 0
    Right, but there's still very real questions of what that means - good for what?2017-01-07
  • 0
    Yeah, "a good method" still isn't well defined. Good for what? If you give more information about how you are trying to apply this, or a list of properties you are concerned with, or want this score to encapsulate, that would help a lot! For example, I think it is safe to say you want this score to be able to tell if two graphs are identical. Are you more concerned with the placement of the edges or the weights? Would you like a graph with a lot of small edges to be similar to a graph with few big edges(in some sense, similar in flow capacities.) You need to be more specific.2017-01-07
  • 0
    @SeanEnglish I updated the question and added more details.2017-01-07
  • 0
    @πr8 I updated the question and added more details.2017-01-07
  • 0
    Would you say that indirect interactions are important or not? In other words, if A interacts strongly with B and B interacts strongly with C, but A does not directly interact with C, do you want the indirect connection between A and C through B still be taken into account in the similarity function?2017-01-07
  • 0
    @NickAlger Chemically maybe, but to keep the model simple, I would like to neglect that. So indirect interactions are not important.2017-01-07
  • 0
    Ok, then my answer below is maybe not the best. You could use the same formula, except with the adjacency matrix instead of the Laplacian, and without the pseudoinverses. That would not exclude indirect interactions2017-01-07
  • 0
    Err, mis-typed. Using the adjacency matrix and no pseudoinverses ($\exp\left(-\frac{||A_1 - A_2||^2}{||A_1||~||A_2||}\right)$) would not *include* the effect of indirect interactions. That is, it would only take into account direct connections.2017-01-07

1 Answers 1

3

Here is one idea. Let $G_1$ and $G_2$ be different graphs on the same vertices, let $L_1$ and $L_2$ be their graph Laplacians, and let the superscript plus symbol denote the pseudoinverse (e.g., $A^+$ is the pseudoinverse of $A$). Then one can define something like:

$$\text{similarity}(G_1, G_2) := \exp \left(-\frac{||L_1^+ - L_2^+||^2}{||L_1^+||~||L_2^+||}\right).$$

The norm $||\cdot||$ can be any matrix norm. I would recommend, for example, the Frobenius norm. Another good choice could be the induced $2$-norm.

Anyways, the idea is that the graph Laplacian is related to diffusion (e.g., of randomly moving particles, or heat) in the graph, in that the $i$'th column of the psdueoinverse matrix $L^+$ is the steady-state result of putting a constant source of particles (or heat, etc) at node $i$ and letting them diffuse randomly through the graph, with the probability of transmission between two nodes related to the edge weight between those nodes. So, $||L_1^+ - L_2^+||$ measures the difference in the graphs in terms of how they physically differ in a diffusion process.

Then dividing by $||L_1^+||~||L_2^+||$ is done to make the similarity measure scale-invariant, and the exponential $\exp\left(-\dots\right)$ is taken to map the values between $0$ and $1$.

This is just my crazy idea so take it with a grain of salt. I don't know if anyone else has done this already; probably someone has.