2
$\begingroup$

I've designed a quadcopter and have it printed out on a 3D printer. Now I need to control it.

I have formulated an MDP (Markov decision process) and want the helicopter to learn when it is in a stable hovering position.

I have an on-board INS (inertial navigation sensor) that outputs $(x,y,z), (\dot{x}, \dot{y}, \dot{z}), (\phi,\theta,\psi), (\dot{\phi}, \dot{\theta},\dot{\psi})$ - [position], [velocity], [attitude] and [attitude rotation speed] respectively).

Therefore, the MDP is formulated as follows:

States:

  • $x$
  • $y$
  • $z$
  • $\dot x$
  • $\dot y$
  • $\dot z$
  • $\phi$
  • $\theta$
  • $\psi$
  • $\dot\phi$
  • $\dot\theta$
  • $\dot\psi$

Actions:

  • rotor1
  • rotor2
  • rotor3
  • rotor4

Rewards:

  • 1 if hovering; 0 otherwise

The trouble is, I don't know how to get a transition probability matrix. Do I get this from flight data? How best to go about this?

Is it necessary to build a quadcopter in a physics simulator (I've never done this before...)

Many thanks in advance,

  • 0
    There is not enough information to answer this question; however, if possible you should consider your rewards as $+\infty$ for hover and $-\infty$ for not. Depending on the type of learning model you use, the basis functions need not be bounded (ie a polynomial neural network). Letting the outputs be unbounded (when possible) generally leads to better behavior.2012-09-18
  • 0
    I'll make it +\infinity -\infinity...2012-09-18

1 Answers 1