0
$\begingroup$

I am completely lost with a homework assignment. Can someone help me out?

The dynamical system is described as follows:

$\begin{align} dx &= v dt\\ dv &= F_g(x) dt + udt + d\xi \end{align}$

Here, $F_g(x) = -g \frac{L'(x)}{\sqrt{1 + L'^2(x)}}$ and $L(x)=-1 -\frac{1}{2}(\tanh(2x + 2) - \tanh(2x - 2))$. $d\xi$ is noise with $\langle d\xi^2 \rangle = \nu$.

The cost for ending (so at time $t = T$) in a particular state is defined as follows:

$\begin{align} \phi(x_T)=\begin{cases} -1 & \mbox{if } x_T < -2 \mbox{ or } x_T > 2\\ 0 & \mbox{otherwise} \end{cases} \end{align}$

Let $C$ denote the cost and $C = \langle \phi(x_T) + \int_0^T dt \frac{R}{2} u^2 \rangle$. This cost should be minimized with respect to the control $u$.

The assignment then asks the following:

Let $J$ be the optimal cost-to-go and $J(x, v, t) = -\lambda \log(\frac{1}{n} \sum_{\mu=1}^n \exp(\frac{-\phi(x_T)}{\lambda}))$. Approximate the optimal control $u$ by using MCMC and by running $n$ times the uncontrolled dynamics.

The control $u$ is continuous.

Does that simply mean start at a $x$, $v$ and run the simulation, calculate $\phi(x_T)$ and then calculate $J$? Furthermore, what would be the optimal control? I know it is solvable by using HJB, but I cannot find a way to transform this 2-dimensional problem to a 1-dimensional problem.

  • 1
    I'm not sure how well you have specified the problem, but I think the idea is the following. For each initial condition $(x_0,v_0)$, you can simulate the SDE with $u=0$ some number of times, compute the cost in each case, and then average the results. This is then an estimate of the cost for that initial condition. You can then find an estimate of the optimal cost (where your optimization parameters are just the initial conditions, not the control, which is fixed) by simply finding the $(x_0,v_0)$ that gave you the lowest cost in the end.2017-01-29
  • 0
    What I have said won't require anything sophisticated (whereas optimizing the control does).However, I am not 100% certain from what you have said that the control should actually be treated as fixed.2017-01-29
  • 0
    Thank you for your reply. It will also give a gradient right? I mean, for each $(x, v)$ pair you will then have an optimal path to go and you could compute the optimal control from that? The control is continuous and not fixed, I will add that to the text.2017-01-29
  • 0
    It might help if you revised the sentence that begins with "Compute $J$". As it stands it sounds like they are just optimizing over $(x_0,v_0)$ when $u$ is taken to be identically zero. But you might mean that you are trying to find the optimal control after the fact, by running the uncontrolled dynamics and then seeing what the control "should have been" along the way. But I am not an expert on control theory, much less stochastic control theory.2017-01-29
  • 0
    That would make the question more clear indeed, I have added it to the description. Thank you again for your answers.2017-01-29
  • 0
    I'm still not sure whether "by using MCMC and by running..." is referring to two different simulations, or if it means that your MCMC is just repeated running of the uncontrolled dynamics. (Sorry for spending so much time on clarifying the question and so little time on math...)2017-01-29
  • 0
    I guess your second thought is the correct one. However, this is also not clear from the assignment. I will talk to the professor about clarification.2017-01-29

0 Answers 0