Copyright RIT 2008
$Id: writeup.xml,v 1.16 2009/03/19 18:57:06 vcss233 Exp vcss233 $
In this lab you will get some practical experience in the different time complexities of algorithms.
You are to work on this lab in teams of size 2. Students who do not abide by this rule will not get full points for in-class participation.
The later activities in this lab deal with a classic computer science problem called the Knapsack problem, A.K.A. "subset sum". The problem's question is as follows:
Given a set of numbers and a desired sum, is it possible to use some (a subset) of the numbers to form that sum?
Or, using the knapsack metaphor, given a set of items and a knapsack of a certain size, is there a subset of the items that exactly fills up the knapsack?
If you think this is an easy problem to solve, you are right. The issue here is that the obvious, simple algorithm that solves it takes a very long time to run!
Look at it this way: How many subsets does
a set of size K have? You can get
a big hint if you remember how mathematicians
express the set of all subsets of K --
it's written as 2K. The number of elements in
this set is 2x, if x is
the number of elements in the original set
K.
So how bad is that? Well, if your set has 5 elements it will have to try 32 subsets. Not bad at all. But if you have a 50-element set, well, if you could check a single subset in a nanosecond (1 billionth of a second), this set would take you 13 days to check all the subsets!
This problem belongs to a set of problems that are called NP-complete. The knapsack problem has an exponential explosion related to the goal sum, but it can be solved using dynamic programming; other NP-complete problems are less tractable. You can check out http://mathworld.wolfram.com/NP-Problem.html if you want to learn more about these problems. In any case, if you found an algorithm for one of them that did not take exponential time, instant Ph.D.!
You will implement Knapsack. As a way to measure time, you will create a special array class to count the number of times any of its elements is accessed. Remember that an operation is chosen because it is assumed (or has been proved) to dominate the time that the program takes overall.
What will make your experiment interesting is that you will first implement a restricted version of Knapsack that asks this question:
Given a set of numbers, a desired sum, and a desired subset of size N, is it possible to use N of the numbers to form that sum?
You might naively think at first that, if there are 50 elements in a set, then N could have a value anywhere from 0 to 50, so just checking one of those sizes would still take about a 50th of the original time. Exponential time divided by a constant is still exponential time. It turns out that that is wrong, as you will see.
Read this entire document.
Review your notes from class and the text about time complexity.
Fetch the required files from /~vcss233/pub/lab02/labTimeComplexity1.jar and extract them. If you are working on a Sun computer in the CS department, you do not have to copy the jar file to your local account. Just use the file name above, minus the initial "/".
Answer the questions found throughout this section by editing the file act-questions.html. Submit it as part of one of the activities (see the submission instructions).
You are to build a structure based on array, but with additional instrumentation added in. The term instrumentation, when applied to software, refers to additional code whose purpose is to monitor or report on the execution of the system to which it is attached. It has no effect on the intended functionality of the rest of the system, except possibly to degrade its performance!
Look at the specification for
InstrumentedArray. It is basically
a very simple wrapper around an array of
Integers. The instrumentation is a counter
that records the number of times the
InstrumentedArray.get(int)
method has been called since the instance was
created. Feel free to add test code as static methods.
If a program used an InstrumentedArray
only to add up all of its
elements, and the size of the array were N,
what would you expect the "get count"
to be when the program was finished?
The constructor for this class takes a native array as an argument. You do not have to copy the array's contents. A reference to the provided array is enough for our purposes.
When you are convinced your class is working correctly, submit it with the following command.
|
The simple version of Knapsack restricts the size of the subset to be a fixed value. So the question becomes,
Is there a subset of size N from set S whose elements' sum is exactly K ?
The provided class KnapsackN already has a partial implementation inside it. It is based on the following algorithm:
|
The code that initializes and advances the choices
array has already been written. You must fill in the code
for the other missing parts. Probably the most important
ones are the
KnapsackN.main(java.lang.String[])
and
KnapsackN.run()
methods. The former will use a
List class and
the InstrumentedArray you have already
developed. The latter will realize the algorithm
shown above.
The questions that follow do not require that you have already run the code, or even written it. However you are free to do the work in any order. You could even jump ahead to activity 4 if you wanted. The reason we put the questions here was to try to get you to predict the program's behavior before it was written.
Consider how many subsets will be tried by this program. Write down a mathematical expression or series that expresses this number. You should write it in terms of X, the size of the original set, and N, the size of the desired subset.
If you are having trouble with the mathematical notation, express it in words as concisely as possible.
What is the time complexity of the number of subsets checked? You should express your answer in terms of the size of the input X and the given constant N, the size of the subset sought. Use big-O notation.
If you count individual element accesses instead of the grosser subset creation count, does that affect your previous answer? Remember, N is a constant here.
The set we've been calling S comes
from standard input. A few sample data files were provided
in the jar file you fetched. Look at the data they contain
and try many different target sums and subset sizes.
You can send the contents of a file into a program's standard input. by using the input redirection operator, <. The following command will send the contents of dataFile to the program as if the user had typed the contents of the file manually, and the program reads the information from System.in.
|
The standard output from your program should
be only one item, the number of get()
accesses to the array it took the program to
check out all its subsets.
On standard error
you should print only one word:
yes,
if at least one subset was found.
no,
if no subsets were found.
When you are convinced your class is working correctly, submit it with the following command.
|
You may not make changes in InstrumentedArray to make this work.
As you know, the general Knapsack problem puts no
restriction on the size of the chosen subset.
By utilizing the code you have already fetched or
written, implementing this algorithm should be
fairly easy. Simply make repeated instances of
KnapsackN for all subset sizes
from 0 to the length of S.
Copy the KnapsackN.main(java.lang.String[]) method and modify it by adding a loop to change the subset size.
Consider how many subsets will be tried by this program. Write down a mathematical expression or series that expresses this number. You should write it in terms of X, the size of the original set. Express it as a time complexity measure. That is, use big-O notation.
(Hint: we already told you!)
If you count individual element accesses instead of the grosser subset creation count, does that affect your previous answer?
The output should basically be the same as in the previous activity.
The standard output from your program should
be only one item, the number of get()
accesses to the array it took the program to
check out all its subsets.
On standard error
you should print only one word:
yes,
if at least one subset was found.
no,
if no subsets were found.
When you are convinced your class is working correctly, submit it with the following command.
|
You may not make changes in InstrumentedArray to make this work.
This activity contains additional questions you must answer. As mentioned above, all questions can be found in act-questions.html. Submit an edited version of this file.
Pick some values of X and run both programs from the last two activities. For the restricted version, use a subset size of 3. Plot the numbers on a graph. You can use the provided steps to generate the postscript file of the graph or you may use any other program to plot your data (as long as you generate a postscript file of your graph and call the file data.ps).
Briefly state your impressions of your results. Were they what you expected?
One way you can generate a plot of your data is to use a tool on the
UNIX systems called gnuplot.
(The gnuplot
program generates a plot file suitable for printing and display.)
We have put together a simple procedure that you can
follow to plot your data, and create a postscript file containing that
graph.
To view what the postscript file contains, use the command:
|
|
Grade Breakdown: