2
$\begingroup$

The whole question is in the title. $p(x)$ is a probability distribution, and $h$ is continuous and monotonic in $p(x)$.

The purpose is to motivate that the "degree of surpise", or the "amount of information" after observing a value of a random variable $x$ having a distribution $p(x)$ is proportional to $\ln p(x)$. The steps leading to the motivation are sketched in Bishop's "Machine Learning and Pattern Recognition", exercise 1.28; this is the last part.

I can't see a why it is so from a constructive point of view, maybe it's obvious? (Of course ensuring that $\ln p$ satisfies is trivial.)

2 Answers 2

1

The hypothesis is that $h(u^t)=t\,h(u)$, for every nonnegative $u$ and $t$. In particular, every solution $h$ is such that $h(2^t)=t\,h(2)$, for every nonnegative $t$. Hence, $h(z)=h(2)\,\log_2(z)$, for every positive $z$. On the other hand, $h_c:z\mapsto c\,\log_2(z)$ solves the equation, for every real number $c$. Hence, the set of solutions is exactly $\{h_c\,;\,c\in\mathbb R\}$.

  • 0
    OK, so we fix $p(x)$ at any value $a$, like $\frac 1 2$, and consider all $a^t$, we get $h(a^t)=t\,h(a)$, but how do we get from there to the logarithm? I mean, logarithm does possess this property, but is it the only one?2012-08-03
  • 0
    Maybe I'm just silly. I'm thinking of $\ln x$ as $\int_1^x \frac {dt} t$, but I forgot if there is a theorem to prove that it is the only function that solves the equation $e^y = x$.2012-08-03
  • 0
    One does not assume that $p(x)=\frac12$. One only uses the fact that $z=2^{\log_2(z)}$ for every positive $z$.2012-08-03
  • 0
    A-a-a-a! That's it. Now I get it. Thanks a lot!2012-08-03
2

This is essentially a functional equation. Suppose the question was to solve $$f(x^k)= k f(x)$$ for all $k$ and positive $x$. Clearly $f(1)=0$ and any logarithmic function provides a solution since $\log_b(x^k) = k \log_b(x)$.

If you prefer you can write this as $f(x)=C\log_e(x)$ as $C\log_e(x^k) = k C\log_e(x)$.

  • 0
    Thanks for the answer, but I see two difficulties. 1. $h$ is a function of a function, not a function of a real positive number; and it is only known that for any probability distribution *p(x)* the equality in the title holds. 2. This is a bit like "backwards" approach — we know the answer and we ensure that it is, indeed, true; this way it is trivial and I wouldn't ask this question if it was what I wanted. I wanted to know whether there is a constructive way to get to $C \ln p$. (I'll edit the original question to reflect that.) Also, how to you ensure that it's the only solution?2012-08-03
  • 0
    $p(x)$ is presumably a probability or a probability density and so must be a non-negative number and can be treated as such; you will have problems if it is $0$ and $t$ is non-positive.2012-08-03