Lab 5: Comparing Sorts

Copyright RIT 2008
$Id: writeup.xml,v 1.23 2008/07/01 14:02:16 vcss233 Exp vcss233 $


Goal

In this lab you will use your theoretical knowledge of sorting to distinguish between selection sort, bubble sort, quick sort, and merge sort. You will also learn a new sorting algorithm, counting sort, that can sort a collections of numbers in O(N) time.


Team Setup

You are to work on this lab completely on your own.

Overview

Objectives

  1. Learn more about simple sorting algorithms.
  2. Learn how to develop effective test cases for sorting algorithms.
  3. Gain experience using theoretical principles to clarify empirical results.
  4. Experimentally compare the complexity of five very different sorts.
  5. Gain experience writing short technical reports.

Ask Dr. Tiger

Pre - Lab Work

  1. Review your class notes on sorting (pay special attention to notes on selection sort, bubble sort, quick sort and merge sort)

  2. Read Sections 8.2.3 and 11.1 - 11.5 of Goodrich and Tamassia.

In-Lab Activities

Each lab you will do is separated into a set of activities. Each activity has specific things you must hand in to get credit for it. Each activity is graded separately.

Activity #1 : Counting Sort

Download and unpack the lab5.jar (http://www.cs.rit.edu/~vcss233/pub/lab05/lab5.jar) that contains the code you will need to complete this lab.

One of the files that you will find in the jar file contains the abstract class SortableIntArray which maintains a collection of integers. The class provides access to the integers contained in the collection via the methods getElementAt() and putElementAt(). The class also provides a method, fill(), which will fill the collection with random integers, and a method, print(), which will print the integers in the collection, one integer per line. The last method defined by the class, sort(), will sort the integers in the collection in ascending order. Take a few moments to look at the code in the file SortableIntArray.java. Make sure that you understand what each method does, and the code that implements each method.

You will also find the files, SelectionSortableIntArray.java, InsertionSortableIntArray.java, QuickSortableIntArray.java, in the jar file. These classes are not abstract because they provide an implementation for the sort() method. As the name of the classes imply, the SelectionSortableIntArray class implements selection sort, the InsertionSortableIntArray class implements insertion sort, and the QuickSortableIntArray class implements quick sort. Take a few minutes to look at the code in each of these files and find out where the code that implements the various sorting algorithms is located. Do not proceed to the next step until you understand all of the classes, the code in each class, and how the SortableIntArray class fits into the picture.

Quick sort has its name for good reason. In general it is the fastest sort available, however, in some cases when there are restrictions on the data that has to be sorted you can sort faster than quick sort.

Suppose for example that the array to be sorted only contains 0's, 1's, and 2's. In this case, it is really easy to sort the array: Go through the array from left to right. While going through the array, count the number of 0's, the number of 1's, and the number of 2's in the array. Then refill the input array from left to right with the correct number of 0's, then 1's, then 2's.

More generally, suppose that the array to be sorted contains only integers between 0 and rangeSize-1 (inclusive). The first step in the process is to create a second array of size rangeSize. (The SortableIntArray class keeps this information for you.) Next enter a loop that steps through the array you are sorting one element at a time. While going through the array, count the number of occurrences of each integer between 0 and rangeSize-1 in the array that you created at the beginning of the algorithm (note that you can use the current number that you are counting as an index in the second array). After you have traversed the entire array, write a second loop that steps through the array that contains the counts of each of the numbers in the array. Use the information in this array to refill your original input array from left to right with the correct number of 0's, then 1's, then 2's, etc.

So for example, take the input array below. Here you will notice that there are 5 '0's in the input, so the value in the count array[0] is 5. After all the counting is done, when the algorithm is creating the sorted array, it will fill the first 5 locations of the array with '0's.


Counting Sort

This sorting algorithm is appropriately called Counting Sort. How fast is counting sort? You go through the data array twice, once to count, and once to refill it. Each step through the array takes only constant time, so the total time is linear in the size of the data array, or O(n).

In this activity, you will write a fourth subclass of the SortableIntArray class named CountSortableIntArray. This new class will implement its sort() method using the counting sort algorithm described above. You will be writing this class from scratch. Note that this class is almost identical to the other subclasses that you have been working with, with the exception of the sort() method.

How To Submit

Once you are sure your CountSortableIntArray class is correct, submit the class by typing the following command:

try grd-233 lab5-1 CountSortableIntArray.java

Activity #2 : Meet the detective

Most of the labs you've done thus far have been programming assignments: we give you a detailed specification, and you write the code to satisfy it. This rest of this lab is going to be very different. First of all you are done writing code for this lab. Instead you will be working with code that we will provide.

You will use a tool known as the Sort Detective in this lab. The Sort Detective is a learning tool that was developed by David Levine of St. Bonaventure University. It provides a graphical user interface, which allows users to run a variety of "Mystery Sorts" on data sets. Because the sorts are not named, it is your job to identify each mystery sort by its behavior--how quickly it runs, how many comparisons and movements it makes.

This lab requires a deep understanding of all the sorts you have discussed in class. You may find it useful to re-read your text, and your class notes before proceeding with this lab. Click here (http://www.cs.rit.edu/~vcss233/pub/lab05/sorts.html) to see a page that describes and summarizes a number of different sorting algorithms. (If the demo applets don't run in your browser, try a different one or see your instructor.) Note this page contains a number of sorts that were not discussed in class, you are only required to know the 5 sorts that will be used in this lab. Do not proceed any further with this lab until you are confident that you understand the algorithmic complexity of the following sort routines:

Meet the detective

Start the Sort Detective application by typing the following command at a Unix prompt:

~vcss233/pub/lab05/sortdetective user-name

where you will substitute your Unix user name for user-name in the command above.

Once you have the program running you will see that the GUI consists of three panels. The top panel with the title "Data Characteristics" is used to change the characteristics of the array of numbers the sort routines will work with. You can change the number of elements in the array by moving the slider. An array can have any where from 0 to 10,000 numbers in it. You can arrange the numbers in the array so that they are in order, in reverse order, are random, or are almost in order by clicking on the radio button next to the label that describes the distribution you would like.

The middle panel contains 5 buttons that are labelled "Sort #0", "Sort #1, and so on. Pressing one of these buttons will activiate a sort routine that will sort an array of numbers whose characteristics are given in the top panel. The Sort Detective is capable of running the following sorts: selection sort, bubble sort, quick sort, merge sort, and counting sort. What you don't know is which button activiates which sorting algorithm. Figuring that out is the point of this lab.

The Sort Detective uses your user name to determine which button is connected to which sort. To make things interesting, the assignment of sorting algorithms to buttons is different for each student. So it is very important that you use the same user name every time you run the program, otherwise you may not be able to make sense out of your results. Note that your instructor will use your user name to determine if your results are correct or not.

The third panel, titled "Sort Results", contains the results of every sorting experiment that you run. For each run the table will show the value of N (the number of elements being sorted), the distribution of the numbers in the array, the label on the button used to sort the numbers, the number of move and comparisons made by the sorting algorithm, and finally the time (measured in milliseconds) required to complete the sort. Note that the time reported by the Sort Detective is wall clock time (not CPU time). This means that the time required to sort the numbers may vary from run to run depending on how much additional work your computer was doing at the time you ran the sort. Even though the times will vary you can still use them to get a feel for the complexity of the algorithm.

Take a few minutes to familiarize yourself with the functionality of the Sort Detective before proceeding to the next step. You will be using this program extensively in the next two activities of this lab.

Activity #3 : Develop a test strategy

In order to be able to successfully complete this lab, you must develop a strategy that you will use to identify each of the sorts implemented by the Sort Detective. This is probably the single most important activity in this lab. Without a well-thought out test plan you will not be able to figure out which sort is activiated by each of the buttons. Without a plan you will just be wasting your time pushing buttons and playing with the GUI.

Developing and documenting your plan

You must develop a set of test cases, which will allow you to accurately identify each of the sorts in the Sort Detective. As you devise test cases, you may wish to consider the following hints:

  1. Keep in mind that a few well-chosen test cases are much more valuable than a multitude of arbitrary tests. The better the test cases, the easier this lab will be to complete.
  2. Try to develop tests that will allow you to separate the sorts by their complexity (you are working with O(N), O(N^2), and O(N log(N)) sorting algorithms in this lab).
  3. Once you have the sorts categorized by complexity, devise data configurations that exploit behavior that is unique to a given sort algorithm.
  4. Do the easy case first: some sorts are very different from one another, others are very similar, so it may be helpful to divide the sorts into subcategories and test each subcategory differently.

How To Submit

Once you have designed a plan for identifying each mystery sort, document the plan using this template (./Auxiliary/test_template.txt) . Rename the template test_strategy.txt and submit it using the following command:

try grd-233 lab5-2 test_strategy.txt

Activity #4 : Put the detective to work

Now it's time to put your plan into action. Run each of the tests described in your test plan and document the results. If you determine that some of your tests are not as effective as you had hoped, add and/or modify your test plan as desired. Just remember, the results you submit must be conclusive.

Once you're convinced that your tests are complete and the results can provide sufficient evidence for your conclusions, write a report that includes the following:

  1. The identification of each sort (e.g., "Sort #0 is a QuickSort");
  2. The Big-O of each sort;
  3. A brief summary of your test strategy;
  4. The defense of your conclusions, including your test results.

We do not provide a template for the report or the results included therein. Instead, we allow you to choose a format and presentation that you feel will support your conclusions effectively. As you write your report, keep in mind that identifying the sorts only accounts for a small fraction of the points for this lab. The bulk of the points rests in the quality and clarity of your written defense.

Note: You may depict your results graphically, so long as they are in GIF, JPEG, or Postscript format. ( try accepts the following file name extensions: .gif, .jpg, .pdf, .ps. ) Simply refer to the graph by its filename in the report. Make sure you submit the graph's file when you execute try.

So how long should the report be? A report which includes the results of well-chosen test cases may only need to be around two pages. If there are several test cases, the report will obviously need to be longer. Use your best judgement. Be sure to run your document through a spelling checker before submitting it. It would be a shame to lose points on a perfect paper for something as silly as typing errors!

How To Submit

Submit your report using the following command:

try grd-233 lab5-3 final_report.txt optional files


Grade Computation

Grade Breakdown: