I'm stuck with Utility Theory. As I understand it, we have
- vector $\vec{x}$ representing evidence about the world
- $n$ possible states of the world $S = \{S_{1}, S_{2}, ..., S_{n}\}$
- $m$ possible actions of the agent $\alpha = \{\alpha_{1}, \alpha_{2}, ..., \alpha_{m}\}$
- utility function $U_{ik}$ which returns a number representing the benefit of taking action $\alpha_{i}$ when the state is $S_{k}$
- the expected utility of any action defined as $EU(\alpha_{i}|x) = \sum_{k}U_{ik} \cdot P(S_{k}|x)$
And, say, our agent would always choose the action which maximizes $EU$. That's pretty much it about the theory basics.
Now, I want to get a better understanding of all those abstractions by imagining a real world scenario and putting some meaningful values/properties into $\vec{x}$, $S_{i}$ etc, so that I can see how to write a program for choosing the best action.
I've seen examples in literature, but when tried to come up with my own one, the questions started to bubble up!
So, my example: an agent has some money and can invest a portion of that into a risky asset. Here's how I think about approaching the solution (with my questions inline):
- Evidence about the world $\vec{x}$ - it may be, say, asset worst case and best case returns, and asset price difference over the last week. (Is that ok? Any other good candidates?)
- States of the world $S_{k}$ - it may be, say, asset price difference between today and yesterday. (Is that ok? Any other good candidates?)
- Actions $\alpha_{i}$ - it may be, say, "invest", "do nothing" and "let a human to decide" (Is that ok to have less actions than states? How to incorporate the idea of "invest a portion" here?)
- Utility function $U_{ik}$ - it may be just a function of agent's current wealth and asset's returns (say, $\sqrt{w}$ and its compare certainty equivalent to the expected gain) but I believe it should be more intelligent. What's the best way to define it (proper function? matrix?), and who should define it (human expert? learned by an algorithm?). How to emphasize the penalty of deferring the decision to a human, by just having a smaller utility?
- Probabilities $P(S_{k}|x)$ - with the $S_{k}$ and $\vec{x}$ I've chosen, I think the probabilities could and should be learned by the algorithm, but I'm not sure. What's the best way to defined them? should they be learned by an algorithm or provided by a human expert? should they be updated over time?
Hopefully I'm not asking too much - thanks!