Alan Kaminsky Department of Computer Science Rochester Institute of Technology 4486 + 2220 = 6706
Home Page
Theory of Computer Algorithms 4005-800-70 Winter Quarter 2003
Course Page

4005-800-70 Theory of Computer Algorithms
Programming Project

Prof. Alan Kaminsky -- Winter Quarter 2003
Rochester Institute of Technology -- Department of Computer Science

Overview
Program Requirements
Report Requirements
Submission
Late Projects
Plagiarism
Grading


Overview

You will investigate whether experimental measurements support the theoretical asymptotic running times for the "search" operation in two different implementations of the set abstraction, a hash table with chaining and a binary search tree. You will write a number of Java programs to perform the measurements and analyze the results as specified below. You will write a report describing your programs and your results as specified below. You will submit the Java source code for your programs; you will also submit your report as a PDF file.


Program Requirements

Set Interface

Write a Java interface for a simplified set abstraction. The interface must have two methods:

  1. Insert an integer into the set. If the integer is already in the set, the set is unchanged.
  2. Determine if an integer is contained in the set. Returns true if the integer is contained in the set, false if it isn't.

Hash Set Implementation

Write a Java class that implements the above set interface using a hash table with chaining. The number of slots in the hash table must be specified as a constructor parameter; the size of the hash table does not change after it is constructed. The hash function must use the division method. You may not use any classes from package java.util in your implementation.

Tree Set Implementation

Write a Java class that implements the above set interface using a binary search tree. You may not use any classes from package java.util in your implementation.

Hash Set Measurement Program

Write a program that measures the running time as a function of input size for the hash set implementation. The program must obtain the following parameters from the command line:

The program must do the following:

  1. For each input size x, where x goes from minx to maxx in a logarithmic fashion with div divisions per decade (as in the class examples):
    1. For each set 1 through numset:
      • Create a hash set that can hold x elements with a load factor of alpha.
      • Insert x different random integers into the hash set. The random integers must come from a pseudorandom number generator seeded with seed1. Note: The hash set must end up containing x different random integers even if the pseudorandom number generator generates repeated integer values.
      • Measure the time it takes to do reps searches in the hash set, where each search value is a different random integer. The random integers must come from a pseudorandom number generator seeded with seed2. The program must use the same technique as the class examples to measure the time.
      • Repeat the previous step, doubling reps each time, until the total measured time is greater than or equal to tt. Each time the program repeats the previous step, the program must re-seed the pseudorandom number generator with seed2 (so it generates the same sequence of search values).
      • Record a data triplet (x,y,z) where x is the input size, y is the measured time for one search, and z is the standard deviation of the measurement error.
  2. Fit the measured data to the appropriate asymptotic function for the hash set search running time, print out the parameter value that gives the least squares curve fit, print out the chi-square value, and print out the goodness-of-fit probability.
  3. Plot the measured data and the fitted curve on a log-log plot.

Tree Set Measurement Program

Write a program that measures the running time as a function of input size for the tree set implementation. The program must obtain the following parameters from the command line:

The program must do the following:

  1. For each input size x, where x goes from minx to maxx in a logarithmic fashion with div divisions per decade (as in the class examples):
    1. For each set 1 through numset:
      • Create a tree set.
      • Insert x different random integers into the tree set. The random integers must come from a pseudorandom number generator seeded with seed1. Note: The tree set must end up containing x different random integers even if the pseudorandom number generator generates repeated integer values.
      • Measure the time it takes to do reps searches in the tree set, where each search value is a different random integer. The random integers must come from a pseudorandom number generator seeded with seed2. The program must use the same technique as the class examples to measure the time.
      • Repeat the previous step, doubling reps each time, until the total measured time is greater than or equal to tt. Each time the program repeats the previous step, the program must re-seed the pseudorandom number generator with seed2 (so it generates the same sequence of search values).
      • Record a data triplet (x,y,z) where x is the input size, y is the measured time for one search, and z is the standard deviation of the measurement error.
  2. Fit the measured data to the appropriate asymptotic function for the tree set search running time, print out the parameter value that gives the least squares curve fit, print out the chi-square value, and print out the goodness-of-fit probability.
  3. Plot the measured data and the fitted curve on a log-log plot.


Report Requirements

Create a PDF file named "<username>.pdf", replacing <username> with your Computer Science account user name. The PDF file must begin with the following information:

Theory of Computer Algorithms
Programming Project
<Your Name>
<Your C.S. Username>
<Date>

The report must have the following numbered sections:

  1. List the names of the Java source files that you wrote and give a brief description of what each source file does.
     
  2. Run your hash set measurement program with the following arguments: minx = 2, maxx = 4, div = 10, numset = 5, tt = 2 seconds or larger (your choice), seed1 = your choice, seed2 = your choice (different from seed1), and alpha = 0.5. Give a table with the program's measured (x,y,z) data triplets. Give the plot the program produced. Give the formula for the fitted curve for the search running time. Discuss whether the experimental data agree with the theoretical asymptotic function for the search running time as a function of the input size.
     
  3. Run your hash set measurement program with the following arguments: minx = 2, maxx = 4, div = 10, numset = 5, tt = 2 seconds or larger (your choice), seed1 = your choice, seed2 = your choice (different from seed1), and alpha = 1.0. Give a table with the program's measured (x,y,z) data triplets. Give the plot the program produced. Give the formula for the fitted curve for the search running time. Discuss whether the experimental data agree with the theoretical asymptotic function for the search running time as a function of the input size.
     
  4. Run your hash set measurement program with the following arguments: minx = 2, maxx = 4, div = 10, numset = 5, tt = 2 seconds or larger (your choice), seed1 = your choice, seed2 = your choice (different from seed1), and alpha = 2.0. Give a table with the program's measured (x,y,z) data triplets. Give the plot the program produced. Give the formula for the fitted curve for the search running time. Discuss whether the experimental data agree with the theoretical asymptotic function for the search running time as a function of the input size.
     
  5. For a hash table, the theoretical asymptotic function for the search running time depends on the load factor alpha. Discuss whether the above experiments agree with the search running time's theoretical dependence on alpha.
     
  6. Run your tree set measurement program with the following arguments: minx = 2, maxx = 4, div = 10, numset = 5, tt = 2 seconds or larger (your choice), seed1 = your choice, and seed2 = your choice (different from seed1). Give a table with the program's measured (x,y,z) data triplets. Give the plot the program produced. Give the formula for the fitted curve for the search running time. Discuss whether the experimental data agree with the theoretical asymptotic function for the search running time as a function of the input size.
     
  7. Write a conclusion discussing what you learned from this project.


Submission

Put the Java source files you wrote and your report PDF file into a Java Archive (JAR) file named "username.jar", replacing username with the user name from your Computer Science Department account. The command for putting a list of files into the JAR file is:

jar cvf username.jar filename filename . . .

Send your JAR file to me by email at ark­@­cs.rit.edu. Include your full name in the email message, and include the JAR file as an attachment.

When I receive your email message, I will verify whether it includes your full name, I will verify whether your JAR file contains all the required files, and I will verify whether I can read your report PDF file. I will then send you a reply email message stating whether I accepted your submission or not. If you have not received a reply within one business day (i.e., not counting weekends), please contact me. Your project is not successfully submitted until I have sent you a confirmation that I accepted it.

The submission deadline is Thursday, February 12, 2004 at 11:59pm. The date/time at which your email message arrives in my inbox will determine whether your project meets the deadline. (The confirmation process is not automated, so you may not receive the confirmation message immediately.)

If you submit your project before the deadline, but I do not accept it, and you cannot or do not submit it again before the deadline, the project will be late (see below). I strongly advise you to submit the project several days before the deadline, so there will be time to deal with any problems that may arise in the submission process.


Late Projects

I will not accept a late project unless you arrange with me for an extension. See the Course Policies for my policy on extensions. Late projects will receive a grade of zero.


Plagiarism

The project must be entirely your own work. I will not tolerate plagiarism. If in my judgment the project is not entirely your own work, you will automatically fail the course. There are only two exceptions:

  1. You may reuse without modification a source file from the Computer Science Course Library or another published source file, provided the author has licensed you to reuse the source file, and provided you state who wrote the source file and where you got the source file.
     
  2. You may take a source file from the Computer Science Course Library or another published source file and add your own modifications, provided the author has licensed you to modify the source file, and provided you state who wrote the original source file and where you got the original source file.


Grading

The project will be graded as follows:

10 points Set interface Java source
10 points Hash set implementation Java source
10 points Tree set implementation Java source
10 points Hash set measurement program Java source
10 points Tree set measurement program Java source
10 points Report Section 1
10 points Report Section 2
10 points Report Section 3
10 points Report Section 4
10 points Report Section 5
10 points Report Section 6
10 points Report Section 7
120 points Total

Theory of Computer Algorithms 4005-800-70 Winter Quarter 2003
Course Page
Alan Kaminsky Department of Computer Science Rochester Institute of Technology 4486 + 2220 = 6706
Home Page
Copyright © 2004 Alan Kaminsky. All rights reserved. Last updated 02-Feb-2004. Please send comments to ark­@­cs.rit.edu.