Next: Phase II Level Assignment Up: Detailed Description Previous: Detailed Description

## Phase I, Combination

The goal of this phase is to build an optimal binary tree in which the levels of the terminal nodes of the tree are realizable by the level by level construction2.1 used in phase three.

The initial sequence of nodes consists of terminal nodes. Each node is identified with its position in the sequence, its weight, and the type of node, which can be internal or external. During each iteration of this phase a new working sequence is produced by combining two nodes of the previous working sequence into one node which replaces the leftmost node of the two and removing the rightmost one from the current sequence. This process continues until only one node is left in the sequence, which is the root of the tree.

Special rules govern the choice of the nodes that are combined at every iteration. Let qi and qk be two nodes, and their corresponding weights are wi, and wk, and their positions in the sequence are i and k respectively. In order for qi and qk to be combined the following constraints must hold, [Knu73b]:

• In the current working sequence no external nodes occur between qi and qk.
• The combined weight w = wi + wk is minimum among all the combined weights of all pairs of nodes between the two terminal nodes. In case of a tie, when more than one pair of nodes have minimum combined weight, the following is applied:
• The pair with the leftmost left node is selected.
• If they share the left node than the pair with the leftmost right node is selected.

The node produced as a result of the combination is of circular type. Its weight is equal to the sum of the weights of its children, and its position in the working sequence is the position of its left child.

To explain the behavior of the Phase I, I define a Huffman sequence2.2 to be a subsequence of a working sequence of nodes, delimited on the left and right by terminal nodes. Initially, the working sequence consists of terminal nodes only. Thus we have a sequence of n-1 Huffman sequences, each having two elements. During the Combination phase the number of Huffman sequences may reduce by one or two due to merging of adjacent sequences. The algorithm makes a greedy choice (ties are resolved in a deterministic manner) of combining nodes within a sequence. The combination process continues until there is only one node left in the working sequence - the root of the tree. Among all possible combinations the one which produces a node of minimum weight is performed.

The integers in parenthesis located above the internal nodes indicate the number of the step of the Combination phase at which the node is produced. For example: the internal node with weight 2 is produced at the first iteration. The root of the tree, internal node with weight 30, is produced at the ninth.

An alternate way to build the optimal tree in Phase I is by combining local minimum compatible pairs (l.c.m.p.). A pair of nodes is called to be a compatible pair if the two nodes in it belong to the same Huffman sequence, i.e., no external nodes occur between them. A pair of nodes is a l.c.m.p. if the weight of the left node qi is less than the weight of all nodes compatible with the right node qjand the weight of the right node qj is less than all the weights of the nodes compatible with the left node qj. Initially, when all the nodes in the sequence are terminal nodes, a pair of nodes is an l.c.m.p. if

• wi-2>wi, and

T. C. Hu proved that a l.c.m.p. in a working sequence preserves its properties after subsequent combinations of other local minimum compatible pairs, i.e., it remains as an l.c.m.p.2.3 This implies that the order of combining local minimum compatible pairs does not change the tree built in this phase. The tree is unique and is not dependent on the order in which the local minimum compatible pairs are combined. Thus one can build the optimal tree by combining local minimum compatible pairs as they are discovered.

#### Footnotes

... construction2.1
Sections 2.2.2 and 2.2.3 elaborate on level by level construction.
... sequence2.2
T. C. Hu and A. C. Tucker defined a Huffman set in [HT71]. The definition is the same. I call it "sequence" since the node's position is important.
... l.c.m.p.2.3
Proof of this Lemma can be found in [Hu82], page 179.

Next: Phase II Level Assignment Up: Detailed Description Previous: Detailed Description
Sashka Davis;961;icsg6;
1999-01-14