|
Alan Kaminsky
|
|
•
|
|
Department of Computer Science
|
|
•
|
|
Rochester Institute of Technology
|
|
•
|
|
4486 +
2220 =
6706
|
|
Home Page
|
|
Theory of Computer Algorithms
|
|
4005-800-70
|
|
Winter Quarter 2003
|
|
Course Page
|
4005-800-70
Theory of Computer Algorithms
Programming Project
Prof. Alan Kaminsky -- Winter Quarter 2003
Rochester Institute of Technology -- Department of Computer Science
Overview
Program Requirements
Report Requirements
Submission
Late Projects
Plagiarism
Grading
Overview
You will investigate whether experimental measurements
support the theoretical asymptotic running times
for the "search" operation
in two different implementations
of the set abstraction,
a hash table with chaining
and a binary search tree.
You will write a number of Java programs
to perform the measurements
and analyze the results
as specified below.
You will write a report
describing your programs and your results
as specified below.
You will submit the Java source code for your programs;
you will also submit your report as a PDF file.
Program Requirements
Set Interface
Write a Java interface for a simplified set abstraction.
The interface must have two methods:
-
Insert an integer into the set.
If the integer is already in the set,
the set is unchanged.
-
Determine if an integer is contained in the set.
Returns true if the integer is contained in the set,
false if it isn't.
Hash Set Implementation
Write a Java class that implements the above set interface
using a hash table with chaining.
The number of slots in the hash table
must be specified as a constructor parameter;
the size of the hash table does not change
after it is constructed.
The hash function must use the division method.
You may not use any classes
from package java.util
in your implementation.
Tree Set Implementation
Write a Java class that implements the above set interface
using a binary search tree.
You may not use any classes
from package java.util
in your implementation.
Hash Set Measurement Program
Write a program that measures the running time
as a function of input size
for the hash set implementation.
The program must obtain the following parameters
from the command line:
-
minx --
Minimum input size, exponent of 10
(as in the class examples)
-
maxx --
Maximum input size, exponent of 10
(as in the class examples)
-
div --
Input size divisions per decade
(as in the class examples)
-
numset --
Number of sets to do for each input size
-
tt --
Total time for each measurement (seconds)
(as in the class examples)
-
seed1 --
Random seed for filling the set
-
seed2 --
Random seed for doing searches
-
alpha --
Load factor (a real number)
The program must do the following:
-
For each input size x,
where x goes from minx to maxx
in a logarithmic fashion
with div divisions per decade
(as in the class examples):
-
For each set 1 through numset:
-
Create a hash set that can hold x elements
with a load factor of alpha.
-
Insert x different random integers into the hash set.
The random integers must come from a pseudorandom number generator
seeded with seed1.
Note:
The hash set must end up containing x different random integers
even if the pseudorandom number generator
generates repeated integer values.
-
Measure the time it takes to do reps searches
in the hash set,
where each search value is a different random integer.
The random integers must come from a pseudorandom number generator
seeded with seed2.
The program must use the same technique as the class examples
to measure the time.
-
Repeat the previous step,
doubling reps each time,
until the total measured time
is greater than or equal to tt.
Each time the program repeats the previous step,
the program must re-seed the pseudorandom number generator
with seed2
(so it generates the same sequence of search values).
-
Record a data triplet (x,y,z)
where x is the input size,
y is the measured time for one search,
and z is the standard deviation of the measurement error.
-
Fit the measured data to the appropriate asymptotic function
for the hash set search running time,
print out the parameter value that gives the least squares curve fit,
print out the chi-square value,
and print out the goodness-of-fit probability.
-
Plot the measured data and the fitted curve
on a log-log plot.
Tree Set Measurement Program
Write a program that measures the running time
as a function of input size
for the tree set implementation.
The program must obtain the following parameters
from the command line:
-
minx --
Minimum input size, exponent of 10
(as in the class examples)
-
maxx --
Maximum input size, exponent of 10
(as in the class examples)
-
div --
Input size divisions per decade
(as in the class examples)
-
numset --
Number of sets to do for each input size
-
tt --
Total time for each measurement (seconds)
(as in the class examples)
-
seed1 --
Random seed for filling the set
-
seed2 --
Random seed for doing searches
The program must do the following:
-
For each input size x,
where x goes from minx to maxx
in a logarithmic fashion
with div divisions per decade
(as in the class examples):
-
For each set 1 through numset:
-
Create a tree set.
-
Insert x different random integers into the tree set.
The random integers must come from a pseudorandom number generator
seeded with seed1.
Note:
The tree set must end up containing x different random integers
even if the pseudorandom number generator
generates repeated integer values.
-
Measure the time it takes to do reps searches
in the tree set,
where each search value is a different random integer.
The random integers must come from a pseudorandom number generator
seeded with seed2.
The program must use the same technique as the class examples
to measure the time.
-
Repeat the previous step,
doubling reps each time,
until the total measured time
is greater than or equal to tt.
Each time the program repeats the previous step,
the program must re-seed the pseudorandom number generator
with seed2
(so it generates the same sequence of search values).
-
Record a data triplet (x,y,z)
where x is the input size,
y is the measured time for one search,
and z is the standard deviation of the measurement error.
-
Fit the measured data to the appropriate asymptotic function
for the tree set search running time,
print out the parameter value that gives the least squares curve fit,
print out the chi-square value,
and print out the goodness-of-fit probability.
-
Plot the measured data and the fitted curve
on a log-log plot.
Report Requirements
Create a PDF file named "<username>.pdf",
replacing <username>
with your Computer Science account user name.
The PDF file must begin with the following information:
Theory of Computer Algorithms
Programming Project
<Your Name>
<Your C.S. Username>
<Date>
The report must have the following numbered sections:
-
List the names of the Java source files that you wrote
and give a brief description of what each source file does.
-
Run your hash set measurement program with the following arguments:
minx = 2,
maxx = 4,
div = 10,
numset = 5,
tt = 2 seconds or larger (your choice),
seed1 = your choice,
seed2 = your choice (different from seed1), and
alpha = 0.5.
Give a table with the program's measured
(x,y,z) data triplets.
Give the plot the program produced.
Give the formula for the fitted curve
for the search running time.
Discuss whether the experimental data
agree with the theoretical asymptotic function
for the search running time
as a function of the input size.
-
Run your hash set measurement program with the following arguments:
minx = 2,
maxx = 4,
div = 10,
numset = 5,
tt = 2 seconds or larger (your choice),
seed1 = your choice,
seed2 = your choice (different from seed1), and
alpha = 1.0.
Give a table with the program's measured
(x,y,z) data triplets.
Give the plot the program produced.
Give the formula for the fitted curve
for the search running time.
Discuss whether the experimental data
agree with the theoretical asymptotic function
for the search running time
as a function of the input size.
-
Run your hash set measurement program with the following arguments:
minx = 2,
maxx = 4,
div = 10,
numset = 5,
tt = 2 seconds or larger (your choice),
seed1 = your choice,
seed2 = your choice (different from seed1), and
alpha = 2.0.
Give a table with the program's measured
(x,y,z) data triplets.
Give the plot the program produced.
Give the formula for the fitted curve
for the search running time.
Discuss whether the experimental data
agree with the theoretical asymptotic function
for the search running time
as a function of the input size.
-
For a hash table,
the theoretical asymptotic function
for the search running time
depends on the load factor alpha.
Discuss whether the above experiments
agree with the search running time's
theoretical dependence on alpha.
-
Run your tree set measurement program with the following arguments:
minx = 2,
maxx = 4,
div = 10,
numset = 5,
tt = 2 seconds or larger (your choice),
seed1 = your choice, and
seed2 = your choice (different from seed1).
Give a table with the program's measured
(x,y,z) data triplets.
Give the plot the program produced.
Give the formula for the fitted curve
for the search running time.
Discuss whether the experimental data
agree with the theoretical asymptotic function
for the search running time
as a function of the input size.
-
Write a conclusion discussing what you learned
from this project.
Submission
Put the Java source files you wrote
and your report PDF file into a Java Archive (JAR) file
named "username.jar",
replacing username with the user name
from your Computer Science Department account.
The command for putting a list of files
into the JAR file is:
jar cvf username.jar filename filename . . .
Send your JAR file to me by email at
ark@cs.rit.edu.
Include your full name in the email message,
and include the JAR file as an attachment.
When I receive your email message,
I will verify whether it includes your full name,
I will verify whether your JAR file
contains all the required files,
and I will verify whether I can read your report PDF file.
I will then send you a reply email message
stating whether I accepted your submission or not.
If you have not received a reply
within one business day (i.e., not counting weekends),
please contact me.
Your project is not successfully submitted
until I have sent you a confirmation that I accepted it.
The submission deadline is Thursday, February 12, 2004 at 11:59pm.
The date/time at which your email message arrives in my inbox
will determine whether your project meets the deadline.
(The confirmation process is not automated,
so you may not receive the confirmation message immediately.)
If you submit your project before the deadline,
but I do not accept it,
and you cannot or do not submit it again before the deadline,
the project will be late (see below).
I strongly advise you to submit the project
several days before the deadline,
so there will be time to deal with any problems
that may arise in the submission process.
Late Projects
I will not accept a late project
unless you arrange with me for an extension.
See the Course Policies for my
policy on extensions.
Late projects will receive a grade of zero.
Plagiarism
The project must be entirely your own work.
I will not tolerate plagiarism.
If in my judgment the project is not entirely your own work,
you will automatically fail the course.
There are only two exceptions:
-
You may reuse without modification
a source file from the
Computer Science Course Library
or another published source file,
provided the author has licensed you to reuse the source file,
and provided you state who wrote the source file
and where you got the source file.
-
You may take a source file from the
Computer Science Course Library
or another published source file
and add your own modifications,
provided the author has licensed you to modify the source file,
and provided you state who wrote the original source file
and where you got the original source file.
Grading
The project will be graded as follows:
| 10 points |
|
Set interface Java source |
| 10 points |
|
Hash set implementation Java source |
| 10 points |
|
Tree set implementation Java source |
| 10 points |
|
Hash set measurement program Java source |
| 10 points |
|
Tree set measurement program Java source |
| 10 points |
|
Report Section 1 |
| 10 points |
|
Report Section 2 |
| 10 points |
|
Report Section 3 |
| 10 points |
|
Report Section 4 |
| 10 points |
|
Report Section 5 |
| 10 points |
|
Report Section 6 |
| 10 points |
|
Report Section 7 |
| 120 points |
|
Total |
|
Theory of Computer Algorithms
|
|
4005-800-70
|
|
Winter Quarter 2003
|
|
Course Page
|
|
Alan Kaminsky
|
|
•
|
|
Department of Computer Science
|
|
•
|
|
Rochester Institute of Technology
|
|
•
|
|
4486 +
2220 =
6706
|
|
Home Page
|
Copyright © 2004 Alan Kaminsky.
All rights reserved.
Last updated 02-Feb-2004.
Please send comments to ark@cs.rit.edu.