2
$\begingroup$

The Markov property does not imply that the past and the future are independent given any information concerning the present. Find an example of a homogeneous Markov chain $\left\{X_n\right\}_{n\ge0}$ with state space $E=\left\{1,2,3,4,5,6\right\}$ such that

$P(X_2=6\mid X_1\in\{3,4\},X_0=2)\neq P(X_2=6\mid X_1\in\{3,4\})$.

I don't understand the phrasing of this question. How should I approach this?

Could this work? enter image description here

$P(X_2=6\mid X_1\in\{3,4\}) = 1$

But $P(X_2=6\mid X_1\in\{3,4\},X_0=2) =0$

3 Answers 3

1

Introduction: Natural language represented using Markov model

A very simple and practical example would be that of natural language.

Now first of all, natural language is just a terminology for any language you speak and have acquired or learned, such as English. When you communicate, you have a set of words {$W$} and a set of operations you can perform on these words. You then proceed further to arrange the words that you know in a certain order that produces meaningful elements of communication/semantics -- "sentences".

When you use words sentences, they are obviously not arbitrarily placed words, they have some order. In fact, sentence formation can be well modeled as a Markov chain model with order "$k$"$=1$ (reference).


Sentence formation and conditional probability

Now establish a relation between your possible states $[1, 6]$ and certain six words. It can be experimentally determined that $P(C | B) \geq P(C|A,B)$ where $A,B$ is an ordered occurrence of the words $A$ and $B$ in a sentence. This relation always holds in case of a natural language as $P (C | B)$ already includes all possible trigrams that can be formed as "$X B C$" where $X$ is a general word of the language. In computational linguistics, this is also termed as the transition probability.

I'll give you an example of that as well:
Let let us, however, consider the backward probability $P(A|$"$A,B$"$)$ i.e., the probability that $A$ is the word processing $B$. Now let $A$ be "chocolate" and $B$ be "chip". Now there will exist a certain probability in the historical literature of the language for $P(A|$"$A,B$"$) = p_{k=1}$. However, let us now consider another word $C$, that represents "cookie". When you compute the new probability, $P(A|``A,B,C")=p_{k=2}$, then you find that $p_{k=2} \leq p_{k=1}$, because there are more words that you can say after saying "chocolate chip" other than simply "cookie". I have denoted it as "$\leq$" because theoretically it is possible that only one such meaningful triplet can be formed in a natural language, although that's rarely ever the case. You cohld simply replace is with a strict inequality relation for all practical purposes.


I think this provides a sufficient as well as intuitive example for the problem thay you have stated. Let me know if something from the answer is unclear.

1

Suppose that $X_0$ is equally likely to be in states $1$ and $2$, each with probability $\frac{1}{2}$.

Suppose that from state $3$, there is a $100\%$ chance of going to state $6$ in the next step.

Suppose that from state $4$, there is a $100\%$ chance of going to state $5$ in the next step.

Suppose that from state $1$, there is a $100\%$ chance of going to state $3$.

Suppose that from state $2$, there is a $100\%$ chance of going to state $4$.


Now, the LHS of your equation will evaluate to $0$, because given that you were in state $2$ at $X_0$, you must go to state $4$ in $X_1$ and therefore state $5$ in $X_2$, which means there's a $0\%$ chance of being in state $6$ at $X_2$.

The RHS of your equation will evaluate to $\frac{1}{2}$, because we are not conditioning on any value of $X_0$, so we use its initial probabilities: $0.5$ chance of $X_0=1$ and $0.5$ chance of $X_0=2$. This means that there's a $0.5$ chance of $X_3=6$ and $0.5$ chance of $X_3=5$.


It will probably help you to draw the directed graph representing the Markov chain's transition probabilities. For good practice, also try to write out the transition matrix (it will be a $6\times 6$ matrix).

  • 0
    I think your solution works. Do you think the example I provided in the problem description would work?2017-02-23
  • 0
    What example? edit: oh I see you added a picture, I'll look at it2017-02-23
1

The key point here is that the Markov property requires you to KNOW the value of at the previous time in order to be independent of prior history; just having some information about the previous time isn't enough to break dependence.

Try thinking of transition matrices in which the following conditions hold:

  1. You are very likely to reach state 3 and unlikely to reach state 4 from a randomly chosen starting position, but when you start specifically from state 2 it is reversed (very unlikely to reach state 3, very likely to reach state 4).

  2. The probability of transitioning to state 6 from state 3 is very different from the probability of transitioning to state 6 from state 4.