4
$\begingroup$

Problem Overview

I am to figure out $v_\pi$ of a certain Markov state.

Given Information

A set of actions, $a$ containing ${up, down, left, right}$

$v_\pi(12), v_\pi(13), v_\pi(14)$ (I am given values for these)

$r(...) = -1$ (all returns are -1, regardless of the transition)

$p(...) = 1$ (each action maps to only one resulting state)

$\pi(a|s) = 1/4$ (the probability of transitioning to any state is 1/4)

$\gamma = 1$ (no discounting is being applied)

Problem

Find $v_\pi(15)$, given that transitioning to states $12, 13, 14, 15$ from $15$ is equiprobable.

Relevant Equations

Bellman equation for $v_\pi$

$v_\pi(s)=\sum_a \pi(a|s)\sum_{s'} p(s'|s,a)[r(s,a,s')+\gamma v_\pi(s')]$

A simplified version, given the context of this question, is:

$v_\pi(s)=\sum_a \sum_{s'}(-1 + v_\pi(s'))$


My Approach

I can easily solve this problem where there are only transitions to states $12, 13, 14$ but I'm having a hard time grasping this problem recursively, because at the end of the Bellman equation, when considering the transition to $15$, $v_\pi(s')$ is $v_\pi(15)$.

Therefore, $v_\pi(15)$ depends on $v_\pi(15)$ and leaves me very confused as to how to compute this by hand (I can't just run an insane number of computations until it converges). Should I just do it iteratively until the value doesn't seem to change too much?

Does anyone have any suggestions for me? Help would be greatly appreciated!

  • 0
    I am trying to find $v_\pi(15)$ and within that inner sum, I am enumerating over state $15$ which requires me to calculate $v_\pi(15)$ in my solution to that very same function. The recursion is messing with my mind lol.2012-10-15

1 Answers 1

1

I think you should follow the way Kolmogorov forward equations are solved for a birth and death MCs. If rate of growth is $\lambda$ and extinction is $\mu$, then $ 0=p'_{j}(t)=\mu \pi_{j+1} + \lambda \pi_{j-1} - (\lambda+\mu) \pi_{j} $ hence $ \pi_j=\frac{\mu}{\mu+\lambda}\pi_{j+1} + \frac{\lambda}{\mu+\lambda} \pi_{j-1} $ with the standardizing condition $\sum_{k \geq 0} \pi_k =1$. so you should eng up getting $\pi_0=1-\frac{\lambda}{\mu}$, $\pi_j = \big(\frac{\lambda}{\mu} \big)^j (1-\frac{\lambda}{\mu})$, i.e. Geometric distribution