Lab 7: Heapsort

Copyright RIT 2009
$Id: writeup.xml,v 1.4 2009/04/27 21:16:26 vcss233 Exp $


Goal

  1. Learn how the heapsort algorithm works.
  2. Implement the code for a binary max heapsort.
  3. Answer questions regarding the time complexity of heapsort.

Overview


The Heapsort

Source: Heap sort animation (http://de.wikipedia.org/wiki/Image:Sorting_heapsort_anim.gif)

In this lab you will learn how an array can be used to store a binary tree of data for relatively fast sorting.

Pre - Lab Work

  1. Review lecture notes by clicking here (http://www.cs.rit.edu/~vcss23x/Lecture/Week 25 - Sorting/notes.html)

  2. Read Goodrich and Tamassia, Chapter 11.

  3. Check out this heapsort applet that tests your understanding of the heapify process here (http://www.iti.fh-flensburg.de/lang/algorithmen/sortieren/heap/heapen.htm)

In-Lab Activities

Answer the questions found throughout this section by editing the file act-questions.html. Submit it as part of one of the activities (see the submission instructions).

Note: some instructors or students prefer this file to be in a plain text format. You may place your answers in a plain text file, but make sure you give the file the correct name so that it will be accepted by try. Check the submission instructions or check with your instructor.

Heapsort is an in-place, comparison based sorting algorithm. The data to be sorted can be viewed as a binary tree with the following properties:


The Heapform

The heap structure you will build is called a binary max heap. This means a node can have at most 2 children, and the data in the parent node must be greater than or equal to all of its descendant nodes. There is no relationship between the data values of the siblings of any parent (i.e. the right child does not have to be greater/less than the left child). The following diagram illustrates a few examples of invalid max heap structures:


Invalid Heaps

Since the data is initially unordered and most likely violates the rules for a valid max heap, it must be heapified. The heapify process starts at the last parent node and it continually sifts down the smallest descendant at each level. If a child has a larger value than the parent, the largest child value is swapped with the parent. That node is then sifted down again until no sift occurs, or there are no more children. Each parent node is sifted in the same manner from the last until the root, at which point the heap structure is valid:


Heapifying the data

The algorithm for sifting down the data requires three things, the array of data, and the start and ending indices in the array to sift between. It works as follows:


siftDown(data, start, end):
	call the starting index the 'root'
	while  'root' has at least one child that is not off the end:
		find the index of the 'child' with the larger value
		if 'child' value is greater than the 'root' value:
			swap the 'root' element in 'data' with 'child'
			make 'child' the new 'root'
		else:
			break
	

The heapify routine uses siftDown to generate a valid max heap where the largest value will be in the root of the tree, and every parent will have a value greater than or equal to its descendants:


heapify(data):
	get the index of the last parent and call it 'start'
	while 'start' is positive:
		sift down from 'start' to index of last element in 'data'
		decrement 'start'
	

Once the heap structure has been built. The largest element in the collection resides in the root node. It is swapped with the last leaf node so that it resides in the last position in the array. The heap is then resifted from the root, excluding the last position. This process is repeated until all of the nodes except the root have been excluded. At this point, the data is completely sorted:


Heapsort (part1)


Heapsort (part2)

The sort routine first calls heapify to generate a valid max heap structure. It then uses siftDown to perform the in-place sort. The trick is that once the largest element is stored in its final poisition, the end marker is decremented so that it will remain untouched for the remainder of the sorting:


sort(data):
	heapify 'data'

	set 'end' initially to the index of the last element in 'data'
	while 'end' is greater than 0:
		swap the first element in 'data' with 'end'
		decrement 'end' by 1
		sift down from 0 to 'end'
	

Activity #1 : Understanding Heapsort

For the first activity you will be answering questions about the heap sort algorithm. You will be required to make diagrams, on paper, that you will hand in to your instructor before continuing on with the lab.

Question #1.1:

Assume you have a heap array of 8 integers that have just been heapified: [15, 11, 8, 3, 5, 4, 2, 1]. The root element has been moved to its final position, and it is time to reheapify the array: [1, 11, 8, 3, 5, 4, 2, 15]. What does the resulting array look like after the new root value has been sifted down in the array?

Question #1.2:

Assume you have the following array of three integers: [4, 5, 2]. Show the heap structure that results from heapifying the data. To receive full credit you should show all your work, not just the final form.

Question #1.3:

Using the heapified array from the previous question, show the heap array at each step in the sorting process before the largest element is swapped into its final position and the new root is sifted down. The final step should have the fully sorted data.

Question #1.4:

Assume you have the following array of 7 integers: [11, 5, 16, 23, 19, 1, 25]. Show the heap structure that results from heapifying the data. To receive full credit you should show all your work, not just the final form.

Question #1.5:

Using the heapified array from the previous question, show the heap array at each step in the sorting process before the largest element is swapped into its final position and the new root is sifted down. The final step should have the fully sorted data.

How To Submit

Show your work to your instructor and make sure that everything checks out ok. Put your text based answers into the act-questions.txt file and submit. It is ok that you haven't completed all the questions in this file, you will be resubmitting at the end of the last activity.


   try grd-233 lab7-1 act-questions.txt
                

Activity #2 : Implementing Heapsort

In this activity you will implement the heapsort routines, and you will write a main test routine that demonstrates it sorts correctly. Take a moment to familiarize yourself with the HeapSort class.

It goes without saying that you are not allowed to use any of the pre-supplied sorting routines, (i.e. Collections.sort()). However, since you will be using an ArrayList to perform the in-place sorting, there is a very useful routine that you are allowed to use that swaps the elements at two locations: Collections.swap(java.util.List, int, int).

To begin with, you should write a main routine that creates an ArrayList of unsorted integers that calls the private siftDown routine. Use your answer from question 1.1 to verify that routine sifts nodes down correctly before continuing.

Next, implement the private heapify routine and verify that you get the same results as your answers for questions 1.2 and 1.4. You should have a valid max heap where each parent node has a value greater than its descendants.

Finally, implement the sort routine and verify that it heapifies and sorts the data. For debugging purposes you can insert a statement at the beginning of the while loop in sort that prints out the array each time before the root node is swapped and sifted. This should yield a result that is similar to your answers for questions 1.3 and 1.5. Don't forget to comment out this debug statement for your final submission, as we will be testing your sorting routines with our own test programs.

How To Submit

When you are sure that your heap sort is working properly and that your test program sufficiently tests all features of your implementation, submit your work by executing the following command:


   try grd-233 lab7-2 HeapSort.java
                

Activity #3 : Analyzing Heapsort

Now that you are comfortable with the heapsort algorithm and its implementation, answer the following questions regarding its complexity.

Question #3.1:

Once a root element has been put in its final position, how much time does it take to re-heapify the structure so that the next removal can take place? In other words, what is the time complexity of a single invocation of siftDown assuming a heap structure of size N?

Question #3.2:

In your estimation, what is the time complexity of the heapify routine that uses the siftDown method (using big-O notation)? Assume the method is being called with the entire range of an array of size N. Explain why you think this is so. Think carefully about this question before answering. For example, is it true that N is constant across all sifts or does it change?

Question #3.3:

In general, do you think there are any data sets of equal size for which heapsort performs better or worse? For example, quicksort was shown to be O(N2) if the pivot was poorly chosen each time. Explain your answer.

Question #3.4:

What do you think the overall time complexity for heapsort is? Why do you feel this way?

How To Submit

When you are ready to submit your answers, using the following command:


   try grd-233 lab7-3 act-questions.txt
                


Grade Computation

Grade Breakdown: