2
$\begingroup$

I am working on a computer program. It can be represented as a linear series of states $s_0 \ldots s_n$

Function $N(s_x)$ is defined such that:

  • With a probability $p$ it will return the next state s_x \rightarrow \left\{ \begin{array}{ll} s_{x+1} & \mbox{if } x \lt n \\ s_{x} & \mbox{otherwise} \end{array} \right.
  • With a probability of $p$ it will return the previous state s_x \rightarrow \left\{ \begin{array}{ll} s_{x-1} & \mbox{if } x \gt 0 \\ s_{x} & \mbox{otherwise} \end{array} \right.
  • With a probability of $1-2p$ it will return the current state $s_x \rightarrow s_x$

The program is an iterative application of $N$ to the initial value $s_i$ $r$ times $Program = N^r(s_i)$

I need to find $P(N^r(s_i) = s_x)$ for all $0 \leq x \leq n$ (the probability that applying $N$ iteratively to $s_i$ $n$ times yields the state $s_x$)

For small values I've been manually working out the different combinations that can lead to a particular state. For example, if $n=2$, $i=1$, $p=0.25$ and $r=2$ I can create a tree to help figure out the probabilities:

A tree if n=2, i=1, p=0.25 and r=2

In this case: $P(s_0) = 0.25*0.25 + 0.25*0.5 + 0.5*0.25 = 0.3125$ $P(s_1) = 0.25*0.25 + 0.5*0.5 + 0.25*0.25 = 0.375$ $P(s_2) = 0.5*0.25 + 0.25*0.5 + 0.25*0.25 = 0.3125$

However, this obviously gets impractical for the larger values I need to deal with in real life (e.g. $r=500$). Is there an easier way to calculate the probabilities?

3 Answers 3

5

You've got a time-homogeneous Markov chain with finite state space, and you can diagonalize the transition matrix and decompose your initial state into its eigenvectors; then applying $N$ $r$ times just becomes multiplying each component with the $r$-th power of the corresponding eigenvalue. Of course this will only be more efficient than computing the $r$ steps directly if $n \ll r$.

2

Your probability is flowing according to the heat equation, and your boundary conditions are equivalent to a mirror, so if $i$=20 and $n$=99, then $s_x$ is the same as for an infinite line of heat flow initialized with point hotspots at ...,-221,-180,-21,20,179,220,279,... .

The heat at a point is just the sum of the contributions from these spots, and the contribution from a spot can be quickly estimated as $\frac{\exp{(\frac{-d^2}{4pr})}}{\sqrt{4\pi pr}}$, where $d$ is the distance from the heat source to the point of measurement. These contributions die off very quickly (although any spot can contribute in principle if the time $r$ gets big enough), so you don't need to include spots from far away in your estimate.

2

1. Exact Values

You get exact values using matrix multiplication, as explained in joriki's answer: $P(N^r(s_i) = s_x)$ is the value in the $x$th column and $i$th row of the $r$th power of the transition matrix $P$. In your worked out example, the matrix is $P=\pmatrix{3/4&1/4&0\cr 1/4&1/2&1/4\cr 0&1/4&3/4}.$ The second power is $P^2=\pmatrix{10/16&5/16&1/16\cr 5/16&6/16&5/16\cr 1/16&5/16&10/16}$ and (as expected) your worked out probabilities $.3125$, $.3750$, and $.3125$ are the second row of $P^2$. That's because the second row corresponds to the starting position $i=1$. (Note: The rows and columns of $P$ are labelled from $0$ to $n$.)

For larger $r$, you take higher powers of the matrix. When $r=10$ we get $P^{10}=\pmatrix{ {189525\over 524288}&{349525\over 1048576}&{320001\over 1048576}\cr {349525\over 1048576}& {174763\over 524288}&{349525\over 1048576}\cr {320001\over 1048576}& {349525\over 1048576}& {189525\over 524288}}$

2. Approximation

For large $r$ values, $P(N^r(s_i) = s_x)\approx 1/(n+1)$ for any $i$ and $x$. That is because the equilibrium distribution for your process is uniform on the state space $\{0,1,\dots,n\}$. In the long run, your process is equally likely to be in any state, regardless of starting position.

For instance, in your worked out example, the probabilities are all approximately $1/3$. When $r=2$, the approximation isn't that great, but the convergence is exponentially fast. For $r=10$, the middle row of the matrix $P^{10}$ says that $P(s_0)=.3333330154$, $P(s_1)=.3333339691$, and $P(s_2)=.3333330154$. These are all pretty close to $1/3$.