2
$\begingroup$

Information theory is not at all my field of expertise, so maybe my question will be a bit naive.

As said in title, I would like to quantify the gain of information of a new information.

For instance, if I have a binary random event:

$P(X=0)=.9$

$P(X=1)=0.1$

If I finally know that $X=0$ (situation 1), I do not gain much information. But if get to know that $X=1$ (situation 2), the situation changes a lot (I gain a lot of information?).

However if I compute the difference of entropy between the original situation and the final one. I get the same difference.

If I say that the fact of getting this new information is itself random event S (with probability $0.5$), s.t:

Situation 1:

$P(S=0 & X=0) = 0.9*0.5$

$P(S=0 & X=1) = 0.1*0.5$

$P(S=1 & X=0) = 0.5$

$P(S=1 & X=1)= 0$

Situation 2:

$P(S=0 & X=0) = 0.9*0.5$

$P(S=0 & X=1) = 0.1*0.5$

$P(S=1 & X=0) = 0$

$P(S=1 & X=1)= 0.5$

The mutual information between the random event S and X is bigger in situation 2, which confirms the intuition that the information gain is bigger in situation 2.

Still, it is a bit weird to me to say that S is a random event and to arbitrary set its probability at 0.5.... (But it is the best that I found so far).

Could you please tell me if there is a more standard way to deal with this situation, and quantify the gain of information?

1 Answers 1

2

I think this question is answered by something even simpler than entropy: self information (also called surprisal).

Recall that the entropy of a discrete random variable $X$ with probability of $P(X=x_i)=p_i$ is given by $H(X)=\sum_i p_i\log_2(1/p_i)$. We can also think of $H(X)=E_p[\log_2(1/p_i)]$ and now the thing we are averaging is the self-information of the value $x_i$, given by $\log_2(1/p_i)=-\log_2(p_i)$.

So in our case the self information of 0 is $-\log_2(0.9)\approx 0.15$ bits and the self information of 1 is $-\log_2(0.1)\approx 3.3$ bits. So learning 1 instead of 0 gives us approximately 3.15 more bits of information.

  • 0
    If we have a model of the whole situation then you might also be interested in the conditional entropy (see the chain rule section of the wikipedia article for a good way to think about it).2012-09-27