2
$\begingroup$

I am using a Markov Chain to get the 10 best search results from the union of 3 different search engines. The top 10 results are taken from each engine to form a set of 30 results.

The chain starts at State x, a uniform distribution of set S = {1,2,3,...30}. If the current state is page $i$, select page $j$ uniformly from the union of the results from each search engine. If the rank of $j$ < rank of $i$ in 2 of the 3 engines that rank both $i$ and $j$, move to $j$. Else, remain at $i$.

I understand the above no problem. The sorting point is where I am stuck however. The paper I am using that explains this says:

This is known as a Markov process, where the transition matrix $P$ has $$P(i, j) = \frac{1}{n}$$ if a majority of the input rankings prefer j to i, and $$P(i, i) = 1−\sum_{j≠i} P(i, j)$$ Under certain conditions, this process has a unique (up to scalar multiples) limiting distribution $x$ that satisfies $x = xP$, where $x(i)$ gives the fraction of time the process spends at element $i$. Dwork et al. propose sorting the elements by non-increasing $x(i)$ values. To ensure that the process has a unique limiting distribution $x$, we use a "random jump": with probability $δ > 0$, we will choose a random element and move to this element (regardless of whether this element is preferred to the current element). In our experiments we have used $δ= \frac{1}{7}$ , which is the value of $δ$ that is often chosen in the literature for PageRank implementations.

Could someone please explain this to me in plain english because I am completely lost with it at this stage. I don't really understand the whole concept. This paper can be found here with this specific part on page 43.

It may help if I add a small example. I have 5 result pages from 3 search engines. These are their rankings:

        engine1  engine2  engine3
Page 1     1        2        2
Page 2     4        3        1
Page 3     2        4        5 
Page 4     5        5        3
Page 5     3        1        4

If page 1 is the start point, to move to page 2 it has to have a lower rank than page 1 in 2 out of 3 of the search engines. The probability of this is $\frac{x}{n}$. I don't know if n should be 3 or 5? That's my first major problem!

  • 1
    Are you familiar with what a Markov process and an equilibrium distribution are?2012-06-22
  • 0
    @GenericHuman my understanding of the Markov Process is that it is the probability of moving from one state to another, depending on a certain criteria. I don't know what equilibrium distribution is.2012-06-22

1 Answers 1

3

We have $n=|S|=5$: the states are the individual pages, so for 3 engines returning 5 possibly duplicate pages, you could have anywhere from 5 to 15 states.

Define the relation between pages $j\prec i$ if and only if $j$ is ranked better than $i$ in a majority of engines. This relation is a partial order, and lower elements are "better". The intuitive goal is to build a Markov chain following this relation to start from a random page and gradually improve the quality of the result.

Accordingly, the definition of $P$ is such that any path in the Markov chain must be decreasing in the sense of $\prec$, and it can always be extended until we land on a minimal element. Such an element is not unique in general: for example, $i$ could be ranked (1,2,3) and $j$ could be ranked (1,3,2), so that $i\not\prec j$ and $j\not\prec i$. Once we are at a minimal element, we can no longer move: we say that the distribution assigning probability 1 to $i$ is stationary for the chain (a distribution is simply a probability measure on the set of states, that is a set of weights summing to 1).

But because there are several stationary distributions, where we land depends on where we started from: we say that there is no equilibrium distribution. This is annoying because we have to pick a sensible starting state, and this won't give good results anyway since we will only get a single non-zero probability page in the stationary distribution.

So to fix that, we "shuffle" things a bit and replace the original transition with probability $\delta>0$ with a transition to a uniform random state: $$P'(i,j)=\delta/n + (1-\delta)P(i,j)$$ As a result, the expected time to visit state $i$ starting from any distribution of states is at most $n/\delta<\infty$: we say that state $i$ is positive recurrent. It's a standard property of Markov chains that when this holds for all states, there is a unique equilibrium distribution, and furthermore it has non-zero probability for each state. This means that no matter which state you start from, if you move along the chain sufficiently many times the distribution of states will get arbitrarily close to the equilibrium distribution.

So better ranked pages will tend to be "visited" more frequently by the chain: if you compute the equilibrium distribution, you then have a natural order with which to rank the whole set of pages.

The equilibrium distribution $\pi$ satisfies $$P'\pi = \pi\\ \sum_{i=1}^n \pi_i=1$$ which is a linear system of equations (and we know this sytem has a unique solution): most mathematical software libraries can easily solve that to give you $\pi$.

  • 0
    Wow that is a very comprehensive answer thank you very very much for that. I'll be honest, a lot of it is still very confusing. This is a level of mathematics far beyond my current abilities so I have to wrap my head around your explanation but thank you.2012-06-22
  • 0
    Yes, I didn't see your edit until just before posting, so given the tags and the wording of the title I had assumed your question was about the math behind it rather than about understanding the general idea. But feel free to ask questions :-)2012-06-23
  • 0
    Great! Well to start, in my example if we start at page 4, there is no way for it to match the criteria to move to another state because it ranks 5,5,3. No rank can be lower than 5 so what happens here?2012-06-23
  • 0
    I understand the ranks to be "lower is better", so that it's the other way round: each transition reduces the rank and we end up at page 1 (the goal is to visit better states more frequently, that's why we move in this direction). Once we reach page 1, we can't move any more: so at each extra step, we stay at page 1. Note that even when we are on page 5, we *can* move to page 1, but it is very probable that we will simply stay on page 5. Just because there is a step in the Markov chain doesn't mean we change state.2012-06-23
  • 0
    Also, this concerns only the original case, before introducing $\delta$: once we introduce $\delta$ there is always a non-zero probability of moving to any other state.2012-06-23
  • 0
    I was interpreting lower ranks to mean 5 being lower than 1 but your interpretation makes much more sense. Also, regarding introducing $δ$, is that done from the start or only when we encounter a zero probability of movement?2012-06-23
  • 0
    It is done from the start: the $\delta=0$ case is only useful for understanding why we need to introduce $\delta$. In a Markov chain, the transition probabilities are described by the transition matrix and never change, so we couldn't change them halfway through even if we wanted to. I highly recommend reading an introduction to this topic :-)2012-06-23
  • 0
    I have tried to find introductions to Markov chains but all of the stuff on the net seems overly complicated. If you could point me to a recommended reading it would be great. You've been a massive help thank you so much.2012-06-23
  • 0
    Searching for "Markov chain tutorial" I found [an 18-minute video](http://www.youtube.com/watch?v=uvYTGEZQTEs). I didn't watch it, but it looks fairly hands-on.2012-06-23
  • 0
    Just want to thank you for your help. That video was the best intro to Markov chains I could have had. Really can't thank you enough. I now understand how they work and how to apply it to my problem. I have one small question though. In the formula you gave me, is it $$P(i,j)=\frac{\delta}{n} + (1-\delta)*P(i,j)$$ i.e. if $\delta=0.14285...$ and $P(i,j)=0.2$ the formula is $$P(i,j)= \frac{0.14285}{n} + (1-0.14285)*(0.2)$$2012-06-24
  • 0
    Yes (but note that this is $P'(i,j)$, not $P(i,j)$). You can check that this formula makes sense by summing over all $j$, and seeing that the result is 1 for every $i$. $P'$ is the [convex combination](http://en.wikipedia.org/wiki/Convex_combination) with weights $(\delta,1-\delta)$ of the two [transition matrices](http://en.wikipedia.org/wiki/Stochastic_matrix) $R$ and $P$ where $R$ is the matrix filled with $1/n$ everywhere.2012-06-24