Lab 2: Time Complexity Analysis

Copyright RIT 2008
$Id: writeup.xml,v 1.16 2009/03/19 18:57:06 vcss233 Exp $


Goal

In this lab you will get some practical experience in the different time complexities of algorithms.


Team Setup

You are to work on this lab in teams of size 2. Students who do not abide by this rule will not get full points for in-class participation.

Overview

Time Complexity of Knapsack

The later activities in this lab deal with a classic computer science problem called the Knapsack problem, A.K.A. "subset sum". The problem's question is as follows:

Given a set of numbers and a desired sum, is it possible to use some (a subset) of the numbers to form that sum?

Or, using the knapsack metaphor, given a set of items and a knapsack of a certain size, is there a subset of the items that exactly fills up the knapsack?

If you think this is an easy problem to solve, you are right. The issue here is that the obvious, simple algorithm that solves it takes a very long time to run!

Look at it this way: How many subsets does a set of size K have? You can get a big hint if you remember how mathematicians express the set of all subsets of K -- it's written as 2K. The number of elements in this set is 2x, if x is the number of elements in the original set K.

So how bad is that? Well, if your set has 5 elements it will have to try 32 subsets. Not bad at all. But if you have a 50-element set, well, if you could check a single subset in a nanosecond (1 billionth of a second), this set would take you 13 days to check all the subsets!

This problem belongs to a set of problems that are called NP-complete. The knapsack problem has an exponential explosion related to the goal sum, but it can be solved using dynamic programming; other NP-complete problems are less tractable. You can check out http://mathworld.wolfram.com/NP-Problem.html if you want to learn more about these problems. In any case, if you found an algorithm for one of them that did not take exponential time, instant Ph.D.!

You will implement Knapsack. As a way to measure time, you will create a special array class to count the number of times any of its elements is accessed. Remember that an operation is chosen because it is assumed (or has been proved) to dominate the time that the program takes overall.

What will make your experiment interesting is that you will first implement a restricted version of Knapsack that asks this question:

Given a set of numbers, a desired sum, and a desired subset of size N, is it possible to use N of the numbers to form that sum?

You might naively think at first that, if there are 50 elements in a set, then N could have a value anywhere from 0 to 50, so just checking one of those sizes would still take about a 50th of the original time. Exponential time divided by a constant is still exponential time. It turns out that that is wrong, as you will see.

Pre - Lab Work

  1. Read this entire document.

  2. Review your notes from class and the text about time complexity.

  3. Fetch the required files from /~vcss233/pub/lab02/labTimeComplexity1.jar and extract them. If you are working on a Sun computer in the CS department, you do not have to copy the jar file to your local account. Just use the file name above, minus the initial "/".

In-Lab Activities

Answer the questions found throughout this section by editing the file act-questions.html. Submit it as part of one of the activities (see the submission instructions).

Note: some instructors or students prefer this file to be in a plain text format. You may place your answers in a plain text file, but make sure you give the file the correct name so that it will be accepted by try. Check the submission instructions or check with your instructor.

Activity #1 : Implementing an Instrumented Array

You are to build a structure based on array, but with additional instrumentation added in. The term instrumentation, when applied to software, refers to additional code whose purpose is to monitor or report on the execution of the system to which it is attached. It has no effect on the intended functionality of the rest of the system, except possibly to degrade its performance!

Look at the specification for InstrumentedArray. It is basically a very simple wrapper around an array of Integers. The instrumentation is a counter that records the number of times the InstrumentedArray.get(int) method has been called since the instance was created. Feel free to add test code as static methods.

Question #1.1:

If a program used an InstrumentedArray only to add up all of its elements, and the size of the array were N, what would you expect the "get count" to be when the program was finished?

Note

The constructor for this class takes a native array as an argument. You do not have to copy the array's contents. A reference to the provided array is enough for our purposes.

How To Submit

When you are convinced your class is working correctly, submit it with the following command.


try grd-233 lab2-1 InstrumentedArray.java
            

Activity #2 : Knapsack-N: A Simpler Knapsack Problem

The simple version of Knapsack restricts the size of the subset to be a fixed value. So the question becomes,

Is there a subset of size N from set S whose elements' sum is exactly K ?

Provided Implementation

The provided class KnapsackN already has a partial implementation inside it. It is based on the following algorithm:


Create an array of size N that will hold indices from the larger array S;
call that array "choices".
For each valid setting of the indices in choices,
    See if the designated set of elements sums to K.
            

The code that initializes and advances the choices array has already been written. You must fill in the code for the other missing parts. Probably the most important ones are the KnapsackN.main(java.lang.String[]) and KnapsackN.run() methods. The former will use a List class and the InstrumentedArray you have already developed. The latter will realize the algorithm shown above.

The questions that follow do not require that you have already run the code, or even written it. However you are free to do the work in any order. You could even jump ahead to activity 4 if you wanted. The reason we put the questions here was to try to get you to predict the program's behavior before it was written.

Question #2.1:

Consider how many subsets will be tried by this program. Write down a mathematical expression or series that expresses this number. You should write it in terms of X, the size of the original set, and N, the size of the desired subset.

If you are having trouble with the mathematical notation, express it in words as concisely as possible.

Question #2.2:

What is the time complexity of the number of subsets checked? You should express your answer in terms of the size of the input X and the given constant N, the size of the subset sought. Use big-O notation.

Question #2.3:

If you count individual element accesses instead of the grosser subset creation count, does that affect your previous answer? Remember, N is a constant here.

Program Input

The set we've been calling S comes from standard input. A few sample data files were provided in the jar file you fetched. Look at the data they contain and try many different target sums and subset sizes.

You can send the contents of a file into a program's standard input. by using the input redirection operator, <. The following command will send the contents of dataFile to the program as if the user had typed the contents of the file manually, and the program reads the information from System.in.


java KnapsackN 3 < dataFile
            

Output Specifications

The standard output from your program should be only one item, the number of get() accesses to the array it took the program to check out all its subsets. On standard error you should print only one word:

Additional messages may be printed to standard error if there was a problem with the execution. The details of these messages are shown in the comments embedded in the partial implementation of KnapsackN you extracted from the jar file.

How To Submit

When you are convinced your class is working correctly, submit it with the following command.


try grd-233 lab2-2 KnapsackN.java
            

You may not make changes in InstrumentedArray to make this work.

Activity #3 : The General Knapsack (subset sum) Problem

As you know, the general Knapsack problem puts no restriction on the size of the chosen subset. By utilizing the code you have already fetched or written, implementing this algorithm should be fairly easy. Simply make repeated instances of KnapsackN for all subset sizes from 0 to the length of S.

Suggested Approach

Copy the KnapsackN.main(java.lang.String[]) method and modify it by adding a loop to change the subset size.

Question #3.1:

Consider how many subsets will be tried by this program. Write down a mathematical expression or series that expresses this number. You should write it in terms of X, the size of the original set. Express it as a time complexity measure. That is, use big-O notation.

(Hint: we already told you!)

Question #3.2:

If you count individual element accesses instead of the grosser subset creation count, does that affect your previous answer?

Output Specifications

The output should basically be the same as in the previous activity.

The standard output from your program should be only one item, the number of get() accesses to the array it took the program to check out all its subsets. On standard error you should print only one word:

Additional messages may be printed to standard error if there was a problem with the execution. The details of these messages are shown in the comments embedded in the partial implementation of Knapsack you extracted from the jar file.

How To Submit

When you are convinced your class is working correctly, submit it with the following command.


try grd-233 lab2-3 Knapsack.java
            

You may not make changes in InstrumentedArray to make this work.

Activity #4 : Questionnaire

This activity contains additional questions you must answer. As mentioned above, all questions can be found in act-questions.html. Submit an edited version of this file.

Question #4.1:

Pick some values of X and run both programs from the last two activities. For the restricted version, use a subset size of 3. Plot the numbers on a graph. You can use the provided steps to generate the postscript file of the graph or you may use any other program to plot your data (as long as you generate a postscript file of your graph and call the file data.ps).

Briefly state your impressions of your results. Were they what you expected?

Plotting Your Data

One way you can generate a plot of your data is to use a tool on the UNIX systems called gnuplot. (The gnuplot program generates a plot file suitable for printing and display.) We have put together a simple procedure that you can follow to plot your data, and create a postscript file containing that graph.

  1. Save the Auxiliary/data file, and edit the file to add your own data. Also save the file Auxiliary/data_plot_print, make sure you leave the file names unchanged.
  2. Execute the program gnuplot from the command line.
  3. At the gnuplot prompt, type load 'data_plot_print'. Note that you need to include the filename in quotes. This script creates a file called data.ps.
  4. Type quit at the gnuplot prompt to finish.

To view what the postscript file contains, use the command:


gv data.ps 

How To Submit


try grd-233 lab2-4 act-questions.html data.ps
                


Grade Computation

Grade Breakdown: