3
$\begingroup$

I've always used Shannon's entropy for measuring uncertainty, but I wonder why to use a logarithmic approach. Why shouldn't uncertainty be linear?

For instance, consider the following pairs of distributions:

$$ \left[0.9, 0.1\right], \left[0.99, 0.01\right] $$

$$ \left[0.4, 0.6\right], \left[0.49, 0.51\right] $$

Then you have the following uncertainty measures: $$ H([0.9, 0.1]) = 0.46899559358928122\\ H([0.99, 0.01]) = 0.080793135895911181\\ H([0.4,0.6]) = 0.97095059445466858\\ H([0.49, 0.51]) = 0.9997114417528099 $$

With the following example I just want show that it doesn't satisfy linearity:

$$ H([0.9, 0.1]) - H([0.99, 0.01]) \simeq 0.3882 $$

$$ H([0.49, 0.51]) - H([0.4, 0.6]) \simeq 0.02876 $$

As we can see, distributions that are closer to $[1,0]$ or $[0,1]$ tend faster to zero.

May be this is more a philosophic question but I think that may be someone could give me alternative measures of uncertainty that may be linear or, at least, provide some explanation to the rationale of this approach.

Thanks!

EDIT I don't mean linearity in the whole space but in the intervals $[0,\frac{1}{2}]$ and $[\frac{1}{2}, 1]$. Since, as @r.e.s. comments, is a required property for such a measure that $f(0) and $f(1)

  • 0
    Why do you expect $ H(X) = -\sum_{i=1}^n {p(x_i) \log_b p(x_i)} $ to be linear?2012-08-02
  • 0
    I don't understand how the difference between the entropy of two different distributions measures uncertainty. Uncertainty of what? Is *uncertainty* a well-defined technical term in information theory that I am unaware of?2012-08-02
  • 0
    @draks: I know that $H(X)$ is not linear... I wonder if there is another approach to measure uncertainty that is actually linear.2012-08-02
  • 0
    @RahulNarain: No, the difference doesn't measure the uncertainty, it was just an example of the non-linearity. Yes, in information theory the uncertainty of a discrete random variable is defined as the entropy of its distribution. More info: [Entropy](http://en.wikipedia.org/wiki/Entropy_(information_theory))2012-08-02
  • 0
    According to Wikipedia, [Linearity](http://en.wikipedia.org/wiki/Linearity) of a function is given when $f(x)$ has the property 1. Additivity (also called the superposition property): $f(x + y) = f(x) + f(y)$ and 2. Homogeneity of degree 1: $f(αx) = αf(x)$ for all $α.$ Both are not fulfilled, so it's not linear.2012-08-02
  • 0
    @draks: that's right. We agree that the function $H(X)$ doesn't satisfy this property. But my question is if someone has ever measured uncertainty with a different function (a linear one, for instance). Why should uncertainty be described by a non-linear function? Do you understand what I mean?2012-08-02
  • 0
    Now I see and can say: I don't know. What do you need it for? *(Satisfying your curiosity is a valid answer)*2012-08-02
  • 0
    My research is about search engines. Given a query, I have a set of semantic categories and, for each category, the probability that the query belongs to this category. So I measure the ambiguity to classify the query in a category by evaluating the Shannon's entropy of its distribution of probabilities. But empirically I have observed this non-linearity and, yes, then I felt very curious.2012-08-02
  • 0
    Do you mind spelling your notation out a bit? Is $[a,b]$ a uniform distribution on that interval? If so, what generalization of the Shannon entropy are you using to continuous variables?2012-08-02
  • 0
    $distrib(q) = [a_1, a_2,\ldots, a_n]$ is a probability distribution. In this particular case, imagine that you have $n$ categories. Then $a_i$ is the probability for a given query $q$ to belong to the $i$-th category. Precisely, what I do to measure the uncertainity for that query is: $-\displaystyle\sum_{i=1}^n a_i \,\log_n{a_i}$ (observe that the base of the logarithm is $n$, the number of categories). Why do you suggest that I should use a generalization for continuous variables?2012-08-02
  • 0
    @Kits89: If $a=(a_1, \dots, a_n), b=(b_1, \dots, b_n)$ are two PMFs, do you want your entropy measure to have $H(a+b)=H(a)+H(b)$?2012-08-02
  • 0
    @Ashok: Well not the _entropy measure_ but the _uncertainty measure_. Right, that could be a option..2012-08-02
  • 0
    Can you explain why you want an uncertainty measure with linearity? Usually people would want rather _additivity_, i.e., $H(a\star b)=H(a)+H(b)$ where $a\star b= (a_ib_j:i=1,\dots,n, j=1\dots, n)$, which apart from Shannon entropy, Renyi entropies have.2012-08-02
  • 0
    Well, I'm not asking for linearity, I'm looking for alternatives to Shannon's entropy. Why? Because I wonder: if the classification of a query uses a method like taking the category that has _maximum likelihood_, why the difference of uncertainty between $[0.9, 0.1]$ and $[0.99, 0.01]$ is much greater than $[0.6, 0.4]$ and $[0.61,0.39]$? It shouldn't grow at the same speed? I'll take a look to Renyi entropy..2012-08-02
  • 0
    @Ashok: The thing is that uncertainty of $[0.5,0,0,0,0,0.5]$ should be grater than $[0.1,0.2,0.1, 0.1,0.1, 0.4]$ because using the method that I said in my previous comment, in the first case we can't decide the category and in the second one, the category is 6th.2012-08-02
  • 1
    I think you want to measure a different aspect. For you, [0.4,0.4,0.2,0] should have lesser uncertainty than [0.3, 0.3, 0.3, 0.1], right? i.e., no. of maximal entries matters, right?2012-08-02
  • 0
    let us [continue this discussion in chat](http://chat.stackexchange.com/rooms/4365/discussion-between-ashok-and-kits89)2012-08-02

1 Answers 1