1
$\begingroup$

The following is an excerpt from CLRS's Introduction To Algorithms:

Although we don't prove it here, a prefix code can always achieve the optimal data compression among any character code, and so we suffer no loss of generality by restricting our attention to prefix codes.

Please answer the question by providing or referring to a valid proof of the above statement.

2 Answers 2

2

The optimal coding referred to here is called Huffman coding. Huffman gave an algorithm which produces a prefix code which is optimal among character codes:

Huffman coding uses a specific method for choosing the representation for each symbol, resulting in a prefix code (sometimes called "prefix-free codes", that is, the bit string representing some particular symbol is never a prefix of the bit string representing any other symbol) that expresses the most common source symbols using shorter strings of bits than are used for less common source symbols. Huffman was able to design the most efficient compression method of this type: no other mapping of individual source symbols to unique strings of bits will produce a smaller average output size when the actual symbol frequencies agree with those used to create the code.

The Wikipedia article has complete details.

Huffman's original paper, " Method for the Construction of Minimum-Redundancy Codes", contains a proof of optimality.

  • 0
    @David: Yes, but the Huffman paper contains a proof of optimality, not just among prefix codes, but among all character codes. I was lazy and did not summarize the proof, but I did look at the paper to see if it was there.2019-01-26
0

I don't think the above answer is correct. I looked the paper and didn't see the proof that "a prefix code can always achieve the optimal data compression among any character code". In fact he assumed that: enter image description here

The "However, the first L(N-1) digits of the..." is the first time he assumes he is using prefix code.

In fact based on a rudimentary search: in wikipedia page of prefix code, there is no mention that prefix code is optimal among character codes. in wikipedia page of Huffman coding it also only states that this coding method only find best prefix code.

Also based on my intuition, I suspect that this fact is false and the book has made a mistake.