next up previous contents
Next: Experimental Comparison of the Up: Example Book Encoding Previous: Example Book Encoding

Comparison of Different Compression Algorithms

In the following experiment the compression ratio achieved by the Hu-Tucker algorithm will be compared to the compression ratio of a fixed-length encoding, and two well-known compression programs: the standard Unix compress and gzip. The file containing the electronic version of ``The Complete Works of William Shakespeare'' is subject of the experiment.

Fixed length encoding calculations:
Number of different words = $29338 \Rightarrow 15$ bits are required to encode each word. Total number of words in the file = 1102905 $\Rightarrow$ Encoded File Size = 15x 1102905 = 16543575 bits = 2067946.87 Bytes.

The following table shows the results. The Input file contains the ``The Complete Works of William Shakespeare'', Input File Size = 6977448 Bytes. compression ratio = $\frac{Input\_File\_Size[Bytes]}{Output\_File\_Size[Bytes]}$

Algorithm Operates Output file Compression
command on size [B] ratio
Hu-Tucker words 1308558.2 5.3322
Fixed Length words 2067946.87 3.3741
code      
Lempel-Ziv characters 2224531 3.166
compress      
Lempel-Ziv characters 2070927 3.3692
gzip      
Lempel-Ziv characters 2046942 3.4087
gzip -9      


Both compress and gzip reduced the input file size by at least 70 percent using Lempel-Ziv coding. The gzip command has different options, allowing for changes of the compression ratio or the running time of the program. The default compression level is 6, forth row of the table, and the best compression level is 9, the last row of the table. The Hu-Tucker algorithm outperforms significantly the two compression programs. The reason is that the Hu-Tucker algorithm is applied to encode whole words, while compress and gzip process files on character bases.
The OABST built by the Hu-Tucker defines variable length prefix code which provides considerably better compression than fixed-length codes, see the first two rows of the table.
The performance of the algorithm and the implementation, and the excellent compression ratio show that the Hu-Tucker can successfully be used for applications requiring optimal encoding and decoding on the fly.
One potential application using the Hu-Tucker algorithm is the File Transfer Protocol (FTP). The Hu-Tucker algorithm will provide additional layer in the application level protocol used for the data connection. The OABST built will effectively compress and decompress extremely large data files, respectively on the transmitting and receiving end. Any distributed or Client/Server application4.3 where submissions of large amounts of data take place could also benefit from the Hu-Tucker algorithm to build an OABST, used for encoding and decoding as front-end and back-end during the transmission on both client and server side. It is important to realize that the knowledge of the frequencies of usage of the words in the language, or letters of the alphabet however, will have to be available to both client and server side to allow for the construction of the OABST.



Footnotes

... application4.3
CORBA and Java RMI are good candidates.

next up previous contents
Next: Experimental Comparison of the Up: Example Book Encoding Previous: Example Book Encoding
Sashka Davis;961;icsg6;
1999-01-14