I am completely lost with a homework assignment. Can someone help me out?
The dynamical system is described as follows:
$\begin{align} dx &= v dt\\ dv &= F_g(x) dt + udt + d\xi \end{align}$
Here, $F_g(x) = -g \frac{L'(x)}{\sqrt{1 + L'^2(x)}}$ and $L(x)=-1 -\frac{1}{2}(\tanh(2x + 2) - \tanh(2x - 2))$. $d\xi$ is noise with $\langle d\xi^2 \rangle = \nu$.
The cost for ending (so at time $t = T$) in a particular state is defined as follows:
$\begin{align} \phi(x_T)=\begin{cases} -1 & \mbox{if } x_T < -2 \mbox{ or } x_T > 2\\ 0 & \mbox{otherwise} \end{cases} \end{align}$
Let $C$ denote the cost and $C = \langle \phi(x_T) + \int_0^T dt \frac{R}{2} u^2 \rangle$. This cost should be minimized with respect to the control $u$.
The assignment then asks the following:
Let $J$ be the optimal cost-to-go and $J(x, v, t) = -\lambda \log(\frac{1}{n} \sum_{\mu=1}^n \exp(\frac{-\phi(x_T)}{\lambda}))$. Approximate the optimal control $u$ by using MCMC and by running $n$ times the uncontrolled dynamics.
The control $u$ is continuous.
Does that simply mean start at a $x$, $v$ and run the simulation, calculate $\phi(x_T)$ and then calculate $J$? Furthermore, what would be the optimal control? I know it is solvable by using HJB, but I cannot find a way to transform this 2-dimensional problem to a 1-dimensional problem.