2
$\begingroup$

The whole question is in the title. $p(x)$ is a probability distribution, and $h$ is continuous and monotonic in $p(x)$.

The purpose is to motivate that the "degree of surpise", or the "amount of information" after observing a value of a random variable $x$ having a distribution $p(x)$ is proportional to $\ln p(x)$. The steps leading to the motivation are sketched in Bishop's "Machine Learning and Pattern Recognition", exercise 1.28; this is the last part.

I can't see a why it is so from a constructive point of view, maybe it's obvious? (Of course ensuring that $\ln p$ satisfies is trivial.)

2 Answers 2

1

The hypothesis is that $h(u^t)=t\,h(u)$, for every nonnegative $u$ and $t$. In particular, every solution $h$ is such that $h(2^t)=t\,h(2)$, for every nonnegative $t$. Hence, $h(z)=h(2)\,\log_2(z)$, for every positive $z$. On the other hand, $h_c:z\mapsto c\,\log_2(z)$ solves the equation, for every real number $c$. Hence, the set of solutions is exactly $\{h_c\,;\,c\in\mathbb R\}$.

  • 0
    A-a-a-a! That's it. Now I get it. Thanks$a$lot!2012-08-03
2

This is essentially a functional equation. Suppose the question was to solve $f(x^k)= k f(x)$ for all $k$ and positive $x$. Clearly $f(1)=0$ and any logarithmic function provides a solution since $\log_b(x^k) = k \log_b(x)$.

If you prefer you can write this as $f(x)=C\log_e(x)$ as $C\log_e(x^k) = k C\log_e(x)$.

  • 0
    $p(x)$ is presumably a probability or a probability density and so must be a non-negative number and can be treated as such; you will have problems if it is $0$ and $t$ is non-positive.2012-08-03