4
$\begingroup$

Problem Overview

I am to figure out $v_\pi$ of a certain Markov state.

Given Information

A set of actions, $a$ containing ${up, down, left, right}$

$v_\pi(12), v_\pi(13), v_\pi(14)$ (I am given values for these)

$r(...) = -1$ (all returns are -1, regardless of the transition)

$p(...) = 1$ (each action maps to only one resulting state)

$\pi(a|s) = 1/4$ (the probability of transitioning to any state is 1/4)

$\gamma = 1$ (no discounting is being applied)

Problem

Find $v_\pi(15)$, given that transitioning to states $12, 13, 14, 15$ from $15$ is equiprobable.

Relevant Equations

Bellman equation for $v_\pi$

$v_\pi(s)=\sum_a \pi(a|s)\sum_{s'} p(s'|s,a)[r(s,a,s')+\gamma v_\pi(s')]$

A simplified version, given the context of this question, is:

$v_\pi(s)=\sum_a \sum_{s'}(-1 + v_\pi(s'))$


My Approach

I can easily solve this problem where there are only transitions to states $12, 13, 14$ but I'm having a hard time grasping this problem recursively, because at the end of the Bellman equation, when considering the transition to $15$, $v_\pi(s')$ is $v_\pi(15)$.

Therefore, $v_\pi(15)$ depends on $v_\pi(15)$ and leaves me very confused as to how to compute this by hand (I can't just run an insane number of computations until it converges). Should I just do it iteratively until the value doesn't seem to change too much?

Does anyone have any suggestions for me? Help would be greatly appreciated!

  • 0
    What's $v_{\pi}(15)$? Probability of eventually visiting state 15?2012-10-15
  • 0
    $v_\pi(15)$ is the *value* of being in a state, considering the values of all states that could possibly be transitioned into from that state. I gave the formula for it in my question.2012-10-15
  • 0
    $\gamma=1$ is just a standardizing condition in Kolmogorov forward equation2012-10-15
  • 0
    Hmm, I understand the semantics of the values; I am just unsure of how to solve this, because the $v_\pi(s')$ at the end of the Bellman equation **depends** on the value I'm trying to calculate.2012-10-15
  • 0
    I am trying to find $v_\pi(15)$ and within that inner sum, I am enumerating over state $15$ which requires me to calculate $v_\pi(15)$ in my solution to that very same function. The recursion is messing with my mind lol.2012-10-15

1 Answers 1