2
$\begingroup$

I'm testing out a few different kind of metrics for a reranking application. I was curious and wanted to try running the PageRank algorithm to see if it could help at all. I used a similarity metric to generate "links" between items in the list that needed reranking.

The PageRank works by simulating a random web surfer. We create a matrix that represents the links between pages and then calculate the probability of surfing to another page. This is done by iterative multiplication of the NxN transition matrix by the 1xN matrix representing the probability of moving to each of the pages. I followed the simple implementation outlined here.

My first problem was that I had a symmetric similarity metric; a completely symmetrical link matrix resulted in equal ranks for every single page. I tried using a less symmetrical metric and produced the following matrix:

  7 -26 -32  -14   8 -14  -20 -14   8  

The transition matrix then was:

-0.09 0.49 0.60  0.66 -0.33 0.66   0.73 0.52 -0.24  

And after running the algorithm all three pages were ranked 0.39, which is useless. My question is how to tell if a matrix will end up producing equal PageRanks. If I knew the characteristics of the required matrix, I could search for suitable metric to fill the matrix and then experiment properly with the algorithm.

  • 1
    Not really necessary $f$or this question, but I don't think it'll hurt to post it. AMS had a [feature column](http://www.ams.or$g$/samplings/feature-column/fcarc-pagerank) on PageRank.2011-08-06

2 Answers 2

11

As pointed out by Byron Schmuland in the above comments, a probability matrix should be a square matrix with (a) nonnegative entries and (b) each row sum equal to 1. Such properties have "nothing to do with the web", and negative numbers are not just "not normal", but completely nonsensical.

Given a probability matrix $A$, the page ranks $\pi=(\pi_1,\pi_2,...,\pi_N)$ are the equilibrium probabilities such that $\pi A = \pi$. When all entries of $A$ are positive, the Perron-Frobenius theorem guarantees that these equilibrium probabilities are unique. If $A$ is also symmetric, its column sums are equal to row sums, and hence all equal to 1. Therefore, $\pi=(1/N)(1,1,...,1)$ is the unique probability vector that satisfies $\pi A = \pi$, i.e. all page ranks must be equal to each other.

Edit: An afternote. In Google's original ranking algorithm, the probability matrix $A$ is made entrywise positive by assuming that a person will have some probability $1-d$ of visiting a page according to the uniform distribution (hence he will visit each page $i$ with probability 1/N >0), or probability $d$ of visiting a page according to the link structure of the whole internet. This is what the "90/10 rule" in the pdf file you mentioned refers to (with $d=0.9$), but in the first paper on the PageRank algorithm published by Sergey Brin and Lawrence Page, $d$ is actually $0.85$ and it is called a "damping factor" in the paper. I don't think Brin & Page were aware of the Perron-Frobenius theorem. They perhaps decided to add this damping factor after a number of trials and errors.

2

Are you sure you've ran the algorithm correctly? As far as I understand what you are doing, you're about to compute $\lim_{n\to\infty} vA^n$, where $v$ is a starting vector, in your case probably (1/3,1/3,1/3) and $A$ is the "transition matrix". Perhaps you have computed $\lim_{n\to\infty} A^n v^T$ instead?

In this case I get that it converges to something like (0.42,0.29,0.36). [You may notice that these values do not add up to 1, as probabilities should - but neither do the rows of your transition matrix. But this might be becaus of rounding you did before posting the matrix here.]

Let me also note that the results seems to be rather unstable - when I changed the second row of the matrix to (2/3,-1/3,2/3), the result was quite different.

However, what you have asked was: When PageRank algorithms produces a multiple of (1,1,1). Note that $\lim_{n\to\infty} vA^n$ converges to some vector $v_0$ if and only if 1 is the largest eigenvalue of $A$ and that this vector is eigenvector of $A$. (This does not depend on whether values in the matrix are non-negative.) This can be considered a special case of power method.

So you simply have to test whether $(1,1,1)A=(1,1,1)$.


Of course it is possible, that I misunderstood what exactly you mean by running PageRank on this matrix. (It might help if you explained how you obtained transition matrix from the link matrix.)