For every size of input sequences 5 different random inputs are examined and the compression ratio is averaged, for the data presented in Table 4.15 and Figure 4.14. Data sets in the range of 1000 to 99000 nodes are examined with increment of 1000 nodes.
The experimental data shown in Table 4.16 and Figure 4.17 has only one data set per input size examined due to the length of the experiment.
As can be seen from the tables in Figure 4.15 and Figure 4.15, the difference in the compression ratio, expressed as bits per word is really negligible. For the range of input sets between 1000 and 99000 nodes the minimum and maximum differences are 0.0731 and 0.0947respectively. For input data sets between 100000 and 1000000 it is between 0.0901 and 0.0943. The difference in the average length of the encoding word stabilizes around 0.09 bits per word. It can be seen from the graphs that it does not exhibit any tendency to grow or shrink. This gives us enough confidence to expect that for any larger input data sets the difference observed will be close to 0.09 bits per word.
The Huffman algorithm has very simple implementation in comparison with the Hu-Tucker algorithm. However the tree built by the Huffman algorithm can only be used for encoding and additional hashing mechanism is needed for the decoding process, especially for alphabets containing millions of nodes.