Next: Experimental Comparison of the
Up: Example Book Encoding
Previous: Example Book Encoding
In the following experiment the compression ratio achieved by
the Hu-Tucker algorithm will be compared to the compression ratio
of a fixed-length encoding, and two well-known compression
programs: the standard Unix compress and gzip.
The file containing the electronic version of ``The Complete Works
of William Shakespeare'' is subject of the experiment.
Fixed length encoding calculations:
Number of different words =
bits are required
to encode each word. Total number of words in the file =
1102905
Encoded File Size = 15x
1102905 =
16543575 bits =
2067946.87 Bytes.
The following table shows the results.
The Input file contains the ``The Complete Works of William Shakespeare'',
Input File Size = 6977448 Bytes. compression ratio =
| Algorithm |
Operates |
Output file |
Compression |
| command |
on |
size [B] |
ratio |
| Hu-Tucker |
words |
1308558.2 |
5.3322 |
| Fixed Length |
words |
2067946.87 |
3.3741 |
| code |
|
|
|
| Lempel-Ziv |
characters |
2224531 |
3.166 |
| compress |
|
|
|
| Lempel-Ziv |
characters |
2070927 |
3.3692 |
| gzip |
|
|
|
| Lempel-Ziv |
characters |
2046942 |
3.4087 |
| gzip -9 |
|
|
|
Both compress and gzip reduced the input file size by
at least 70 percent using Lempel-Ziv coding.
The gzip command has different options, allowing for changes of the
compression ratio or the running time of the program.
The default compression level is 6, forth row of the table,
and the best compression level is 9, the last row of the table.
The Hu-Tucker algorithm outperforms significantly
the two compression programs. The reason is that the Hu-Tucker algorithm
is applied to encode whole words, while compress and gzip
process files on character bases.
The OABST built by the Hu-Tucker defines variable length prefix code
which provides considerably better compression than fixed-length codes,
see the first two rows of the table.
The performance of the algorithm and the implementation,
and the excellent compression ratio show that the Hu-Tucker can
successfully be used for applications requiring optimal
encoding and decoding on the fly.
One potential application using the Hu-Tucker algorithm
is the File Transfer Protocol (FTP). The Hu-Tucker algorithm will
provide additional layer in the application level protocol used
for the data connection. The OABST built will effectively compress
and decompress extremely large data files,
respectively on the transmitting and receiving end.
Any distributed or Client/Server application4.3
where submissions of large amounts of data take place could also
benefit from the Hu-Tucker algorithm to build an OABST, used
for encoding and decoding as front-end and back-end
during the transmission on both client and server side.
It is important to realize that the knowledge of the frequencies
of usage of the words in the language,
or letters of the alphabet however, will have to be available to both
client and server side to allow for the construction of the OABST.
Footnotes
- ... application4.3
- CORBA and Java RMI
are good candidates.
Next: Experimental Comparison of the
Up: Example Book Encoding
Previous: Example Book Encoding
Sashka Davis;961;icsg6;
1999-01-14