Next: Fast Implementation Time Complexity Up: Phase I Previous: Phase I

## Fast Implementation Model

As explained in the previous chapter, Phase I of the Hu-Tucker algorithm combines compatible nodes from the initial sequence of weights until only one node is left. Two or three Huffman sequences could be merged as a result of the combination. The implementation model that I will build was suggested by Knuth in [Knu73b], page 444.

Let the initial sequence contain N terminal nodes. Two abstract data structures will be needed to describe the engine of the algorithm in
Phase I: an array and a priority queue. The engine components are: two arrays T[N] and A[N]; one Master Priority Queue (MPQ); and N-1 Huffman Sequence Priority Queues, also referred as Huffman Priority Queues (HPQ).
T[N] is an array holding the initial sequence of terminal nodes. Each element of the array T[N] must at least keep:

• The letter of the alphabet.
• The type of the element: terminal or internal node.
• The weight3.2 of the letter.
• Each terminal node should also know the two priority queues in which it participates, i.e., each terminal node should be able to find in O(1) time those elements of the MPQ, which refer to priority queues holding the two Huffman sequences which are delimited by this terminal node. See Figure 3.1.
The second array, A[N], represents the current working sequence. Each element of A[N] holds the address of the corresponding node of the working sequence. Initially all the nodes in the working sequence are terminal nodes. Thus the elements of A[N] hold the corresponding addresses of the terminal nodes. As the Combination phase is in progress A[N] will point to both terminal and internal nodes. The number of nodes in the working sequence steadily decreases. Thus some of the elements of A[N] will become null. Initially the relationship between A[N] and T[N] is as follows:

;
For example, assume that the weight distribution of the nodes in the initial sequence is such that the first combination that ought to be performed is the pair T[1] and T[2]. After the combination the working sequence has changed: the terminal nodes T[1] and T[2]no longer participate in, thus the array A[N] has changed its state:

, I[1] - new node;
A[2] = null.

The rest of the elements of the array remain unchanged. See Figure 3.2.

The HPQs hold the elements of the Huffman sequences at any stage of the Phase I of the algorithm. Each individual priority queue represents one Huffman sequence. Initially there are N-1 HPQs, and each of them holds two adjacent terminal nodes from the initial sequence. Let n denotes the number of nodes in the queue. The priority queue data structure representing the HPQ must be able to perform the following operations:

• Report in time the two least elements of the priority queue and their positions in the working sequence.
• It must be able to find in time the two terminal nodes, which define the boundary of the Huffman sequence it points to.
• Delete the minimum and arbitrary element of the priority queue in time.
• Merge two HPQs in time.
The MPQ has as many elements as the current number of Huffman sequences, i.e., the number of the HPQs. The MPQ is responsible to choose a compatible pair from the current working sequence according to the rules stated in Section 2.2.1.
Each element of the MPQ must at least hold the sum of the two least elements from each individual HPQ, and their positions3.3, designated as
MinSum, i and j, on Figures 3.1 and 3.2.
Each element of the MPQ has a reference to the root of the HPQ it represents. The elements of the MPQ should not change their physical locations, since each terminal node from the initial sequence has a reference to those elements of the MPQ, that are representing the Huffman sequences to which the terminal node participate. If a terminal node is to be combined with another node, the two Huffman sequences it belongs to will be merged, thus those elements of the MPQ must be deleted, and the corresponding priority queues merged. The merge will generate a new Huffman sequence. The node representing it will be inserted to the MPQ. All the priority queue operations must be performed in at most O(1) or time to achieve a reasonable performance.

To summarize the MPQ must provide the following operations:

1.
Insert an element to the queue in time.
2.
Delete the minimum element of the queue in O(1) time.
3.
Delete an arbitrary element of the queue in time.
4.
The physical locations of the priority queue elements should not change.

Figures 3.1 and 3.2 present the engine of the algorithm in Phase I. For simplicity the author has assumed that the first three weights of the initial sequence are the smallest. Thus the HPQ representing the Huffman sequence (w1,w2) has the smallest combined sum and as a result the MPQ's element representing it is on the top of the MPQ. Figure 3.2 illustrates the state after the first combination is performed. Note the change of the state of all components of the engine after the combination.

Figure 3.3 represents an example of Phase I of the Hu-Tucker algorithm applied on the following sequence of weights: 10, 2, 2, 1, 1, 4, 15, 17, 25, after performing three combinations. Following is a detailed description of the first three iterations of the algorithm. The MPQ has selected the pair of nodes T[4] and T[5] to be combined first. The internal node I[1] is produced as a result of the combination. Also, three priority queues are merged into one and the duplicate terminal nodes participating are deleted. In detail: (T[3], T[4]), (T[4],T[5]), and (T[5], T[6]) priority queues have to be merged. The elements T[4] and T[5] are deleted from all three priority queues prior to the merge. After the clean up followed by the merge the new priority queue produced has the elements (T[3], I[2], T[6]), where I[2]=combine(T[4], T[5]). The MPQ is updated: three elements are deleted and one is inserted. The state of the array A[N] has changed as well: A[4] points to I[1] and A[5] is null. At the second iteration of the algorithm the terminal nodes T[2] and T[3] are selected. The combination produces internal node I[2]. A[2] is redirected to point to I[2], and A[3] is null. Again, two terminal nodes are combined and as a result three priority queues have to be merged into one. Clean up of the duplicate nodes is performed. Three elements of the MPQ are deleted and one is inserted. At the third step the internal node I[1] is combined with the terminal node T[6]. Two priority queues are merged into one: (T[1], I[2], I[3], T[6]) and (T[6], T[7]). Terminal node T[6] is deleted from both of the priority queues prior to the merge. The new internal node I[3] is inserted to the new priority queue. The MPQ and the array A[N] are updated.

In total N-1=9-1=8 combination steps are performed, and only one element is left in the working sequence which is the root of the tree. Thus only the first element of the array A[N] is not null. The tree is built and the Combination phase has terminated. Figure 3.4 illustrates the engine of the algorithm in its final state.

Hu-Tucker Algorithm Phase I, Combination
Initialization
Populate the array T[N] and initialize A[N]
Create and initialize (N-1) HPQs and the MPQ
End:Initialization
Main Loop
for ( ++)
Extract_Min(MPQ): get the two nodes of the working sequence, which
are on the top of MPQ and will have to be combined.
Delete those two nodes from all HPQs where they participate
Merge all those HPQs into a new , if necessary
Update the MPQ: delete the merged HPQs from MPQ
Combine the two nodes into a new node
Update the array A[N]
Insert the new node into the
Insert the to MPQ
End: Main Loop
End: Phase I Completed

#### Footnotes

... weight3.2
The weight is a representation of the frequency of usage of the letter in the alphabet.
... positions3.3
This information is needed to resolve ties. See Section 2.2.1.

Next: Fast Implementation Time Complexity Up: Phase I Previous: Phase I
Sashka Davis;961;icsg6;
1999-01-14