5
$\begingroup$

When will a probabilistic process obtained by an "abstraction" from a deterministic discrete process satisfy the Markov property?

Example #1) Suppose we have some recurrence, e.g., $a_t=a^2_{t-1}$, $t>0$. It's a deterministic process. However, if we make an "abstraction" by just considering the one particular digit of each $a_t$, we have a probabilistic process. We wonder whether it satisfy a Markov property or not?

Example #2) Suppose we have a finite state automaton. Now we make an "abstraction" by grouping the states into sets of states and obtaining a probabilistic finite state automaton. We consider this automaton in time and we wonder whether it satisfies the Markov property or not.

The particular examples are not important, of interest are some general conditions when a deterministic process becomes a Markov process after an "abstraction" of the kind above (in any context). I'm looking for any references on this matter.


Edit: as pointed out in the comments below, the examples #1 and #2 were not well specified. Now there's a distribution on the starting state $a_0$ and $a_t=f(a_{t-1})$ is a deterministic function. Then, $a_t$, $t\geq 0$ is a degenerate Markov chain. Now the question is whether grouping some of the states of such a chain can yield a Markov chain (i.e., pointers to literature where the conditions would be discussed is required).

A more general problem seems to be "Given any Markov chain (i.e., not a degenerate one), group the states into sets: what are the conditions under the resulting process satisfies the Markov assumption?"

  • 0
    It depends on the way of abstraction, of course ) sometimes it is focused to provide exactly Markov process. Unfortunately I cannot give an extended answer now - I try to do it tomorrow2011-10-25

3 Answers 3

2

It seems both examples fit into the following setting. One starts from a (deterministic) dynamic system defined by $a_0\in A$ and $a_{t+1}=u(a_t)$ for every nonnegative integer $t$, for a given function $u:A\to A$, and one considers the $X$-valued process $(x_t)_{t\geqslant0}$ defined by $x_t=\xi(a_t)$ for every nonnegative integer $t$, for a given function $\xi:A\to X$.

For every fixed $a_0$, $(x_t)_{t\geqslant0}$ is deterministic hence $(x_t)_{t\geqslant0}$ is a (quite degenerate) inhomogenous Markov chain whose transition at time $t$ is the kernel $Q_t$ such that $Q_t(x,y)=P(x_{t+1}=y\mid x_t=x)$ is undefined for every $x\ne\xi(a_t)$ and $\delta_y(\xi(a_{t+1}))$ if $x=\xi(a_t)$.

One way to get a truly random process $(x_t)_{t\geqslant0}$ in this setting is to choose a randomly distributed $a_0$. But then there is every reason to expect that $(x_t)_{t\geqslant0}$ will not be a Markov chain and in fact the construction above is a classical way to encode random processes with a complex dependence structure.

One example which might help get a feeling of what is happening is the case when $A=\mathbb R$, $u(a)=a+\frac15$ for every $a\in A$, $a_0$ uniformly distributed on $(0,1)$, and $\xi:\mathbb R\to\mathbb N$ the function integer part. Then $x_{t+1}\in\{x_t,x_t+1\}$ with full probability but $(x_t)_{t\geqslant0}$ is not Markov. However, $x_{t+1}$ is a deterministic function of $(x_{t},x_{t-1},x_{t-2},x_{t-3},x_{t-4})$ (or something similar) hence $(x_t)_{t\geqslant0}$ is a (degenerate) fifth-order Markov process.


Edit A feature preventing $(x_t)_{t\geqslant0}$ from being Markov in the example above is the existence of points $a$ and a'\ne a in $A$ visited by the process $(a_t)_{t\geqslant0}$ such that \xi(a)=\xi(a') but \xi(u(a))\ne\xi(u(a')). This is reminiscent of the condition for a hidden Markov chain to be a Markov chain, which reads as follows.

Assume that $(a_t)_{t\geqslant0}$ is a Markov chain with transition kernel $q$ and let $(x_t)_{t\geqslant0}$ denote the process defined by $x_t=\xi(a_t)$ for every nonnegative $t$. Then $(x_t)_{t\geqslant0}$ is a Markov chain for every starting distribution of $a_0$ if and only if the sum $ q_\xi(a,y)=\sum\limits_{b\in A}q(a,b)\cdot[\xi(b)=y] $ depends on $a$ only through $x=\xi(a)$, that is, if and only if $q_\xi(a,y)=Q(x,y)$ for a given function $Q$. When this condition, called lumpability, holds, the Markov chain $(a_t)_{t\geqslant0}$ is said to be lumpable (by the function $\xi:A\to X$) and $Q$ is the transition kernel of the Markov chain $(x_t)_{t\geqslant0}$.

The question to know whether $(x_t)_{t\geqslant0}$ is a Markov chain for a given starting distribution of $a_0$ is more involved but a condition for this to hold is stated here.


Second edit Here is an example in continuous state space showing that the initial distribution is important.

Let $A=\mathbb R/\mathbb Z$ denote the unit circle, $u:A\to A$ defined by $u(a)=2a$, $X=\{0,1\}$, $\xi:A\to X$ defined by $\xi(a)=[a\in A_1]$ where $A_1=(\mathbb Z+[\frac12,1))/\mathbb Z$, and $a_{t+1}=u(a_t)$ and $x_t=\xi(a_t)$ for every nonnegative $t$. Then, if the distribution of $a_0$ is uniform on $A$, the process $(x_t)_{t\geqslant0}$ is a Markov chain since it is in fact i.i.d. with $x_t$ uniform on $X$ for every $t$.

This is adapted from the example based on the logistic map presented in these notes by Cosma Shalizi, which goes as follows.

Let $B=[0,1]$, $v:B\to B$ defined by $v(b)=4b(1-b)$, $\eta:B\to X$ defined by $\eta(b)=[b\in B_1]$ where $B_1=[\frac12,1]$, and $b_{t+1}=v(b_t)$ and $y_t=\eta(b_t)$ for every nonnegative $t$. Then, if the distribution of $b_0$ is the arcsine distribution, with density $1/(\pi\sqrt{b(1-b)})$ on $B$, the process $(y_t)_{t\geqslant0}$ is a Markov chain since it is in fact i.i.d. with $y_t$ uniform on $X$ for every $t$. Shalizi notes that $(y_t)_{t\geqslant0}$ is a Markov chain with respect to its own filtration, since the distributions of $y_{t+1}$ conditionally on $(y_s)_{s\leqslant t}$ or conditionally on $y_t$ are the same (and both are the uniform distribution on $X$). On the other hand the distribution of $y_{t+1}$ conditionally on $(b_s)_{s\leqslant t}$ is the Dirac measure at $+1$ or at $0$ since $y_{t+1}$ is a deterministic function of $b_t$. More precisely, this conditional distribution is the Dirac measure at $\eta(v(b_t))$.

Finally, the examples based on $u$ and $v$ are conjugate since $v\circ \sigma=\sigma\circ u$ with $\sigma:A\to B$ defined by $\sigma(a)=\sin^2(\pi a)$. Note that, if $a_0$ is uniform on $A$, then $\sigma(a_0)$ follows the arcsine distribution on $B$.

  • 0
    If $(a_t)$ is ergodic, what you suggest is equivalent to choosing $a_0$ randomly and at stationarity, that is, the distribution of $a_0$ is the unique probability measure invariant with respect to the action of $u$.2011-10-26
1

As @Did points out, you want to know when "lumping" preserves the Markov property. See Lemma 2.5 in Markov Chains and Mixing Times by Levin, Peres, and Wilmer. The book is available online here.

  • 1
    Yes. And Lemma 2.5 covers the *sufficient* condition (which is the simplest part).2011-10-30
0

The simple answer is: every discrete deterministic process is already a Markov process. The "probabilities" in question are just restricted to be 0 or 1.

  • 0
    @wu7, if your "abstraction" map $f:A\rightarrow B$ is deterministic, then the result is automatically a Markov chain. (A minor caveat is that you really want your $A$ to have a well-defined probability measure so that the inverse image of $b\in B$ has a well-defined probability.) This actually eliminates your example #1, unless you assume an improper prior on the integers. (I'm assuming $a_t$ is meant to be an integer.)2011-10-25