0
$\begingroup$

Considering a list of numbers $\{a_1,a_2,...,a_n\}$, after sorting the $n$ numbers in increasing order, how much the entropy changes?

Updated

Or we can understand the problem by using the number of bits to describe the list: For a sorted list in increasing order which is \{a_1^', a_2^',...,a_n^'\}, we can convert the list to d-gap list \{a_1^', a_2^'-a_1^',...,a_n^'-a_{n-1}^'\}, if we use fix-length code, we need $n \log (\max_i (a_i^'-a_{i-1}^'+1)$) bits, while if a list is in random order, we need $n \log (\max_i a_i+1)$ bits to describe it, which is more than that of d-gap list.

  • 0
    It might make sense; it's just not quite clear what you mean as long as you don't specify some model or something. For instance, the list could already be sorted, but if your model of it doesn't take that into account, you'll need lots of bits to describe it anyway. Then when you sort it and your model now takes into account that it's sorted, you can describe it with fewer (not "less", BTW) bits even though it hasn't changed.2011-09-07

2 Answers 2

2

I would argue that under any "natural" formulation of this problem (e.g., the numbers are drawn independently from the same probability distribution), the entropy should decrease by $\lg n!$.

  • 0
    We don't actually require that the probability distribution be uniform, we just require that the odds of any two $a_i$ being equal is vanishingly small.2011-09-07
0

This should be a coment to Craig's answer. I agree that a decrease by $\log(n!)$ seems the right answer, almost obvious from the "information" point of view - but we don't have a proof yet.

Assuming we have $n$ iid random variables $x_i$, we know the joint entropy of $(x_1, x_2 \cdots x_n)$ is $n H(x)$. The question amounts to compute the joint entropy of the ordered variables $(y_1, y_2 \cdots y_n)$ , where $y_1 =x_{(1)}$ is the first order statistic , etc. This is in principle doable, but I don't see how a general result should be obtained.

To put an example: assume $x_i$ is uniform in $[0,M]$ (continuous variables are more tricky in regards to entropy interpretations, but here the math is a little simpler, and as long as we dont change scales, we are allowed to compare entropies; anyway, this also could be done with discrete variables). We have a join (differential) entropy of $H_0 = n \log(M)$ The ordered variable $y$ can equivalently be specifed by the differences $(z_1,z_2 ... z_n)$ where $z_i=y_i-y_{i-1}$. We know that, for large $n$, $z_i$ are asympotically independent, and exponential with mean $M/n$; the joint entropy then tends to $H_1 = n [1-\log(n/M)] \approx - n \log(n) + n \log(M)$. Then, the decrease of entropy that results from ordering is about $n \log(n) \approx \log(n!)$

  • 0
    @Fan Zhang: The entropy of an exponential rv is $1 - \log(\lambda)$ http://en.wikipedia.org/wiki/Exponential_distribution2011-09-08