Experimental Project
Common Subsequence Algorithms
CSCI-665, Spring 2017


Contents


Goal

The goal of this project is two fold. Firstly, to observe empirically complexities of different implementations of algorithms for the same problem: finding longest common subsequence in two sequences. Secondly, to find out how accurate are the theoretical estimates of complexity when compared to practical execution times.

Dates

The project can be completed individually, or by a team of two students. In the latter case, please let me know by email (with a cc to your partner) the composition of two person teams by Thursday, March 9, 2017.

Submit the source code (hardcopy) of at least one algorithm with time measuring routines, a sample profiler run, and at least one higher level script, by Tuesday, April 11, 2017. This will be the skeleton of your full experiment.

The hardcopy of full report, containing description and analysis of the results, and the final source code are due Tuesday, May 9, 2017. Note that in order to write a meaningful report, the experiments should be completed, say, a week earlier. The project will be graded according to the criteria listed in the gradesheet.

Algorithms

Implement (at least) the following algorithms for the longest common subsequence problem:

In each case the task is to find the length of the longest common subsequence of two sequences, and generate some or all of them. Use C, C++, Java or other programming language for this project, as long as you can perform fine cpu time measurements of your experiments. Make your code simple, minimize user interface and i/o operations, etc., concentrate on the comparison of execution times and memory requirements.

Input Data

Use small toy examples first, to ensure that all your implementations are logically correct. Then, the most important input for running timing experiments should be pairs of randomly generated sequences. Some pairs of input sequences, for at least one of the algorithms, must be of length at least 40000 for the case of computing only the length of LCS.

Required

Optional

The last two features may easily become the subject of your final MS project, thesis, and beyond.

Experiments

Different algorithms should use the same input sequences for cpu time comparison. Design the experiments so that they are informative. Vary the values of parameters.

You should use scripts to organize your experiments. Finer time measurements can be obtained by using time system calls in C, C++, or Java, from the inside of the program. Very fine time analysis, can be done with the help of gprof (see man entries for time and gprof).

Results

Describe how did you organize your experiments. Comments should be embedded into the source code. I may ask you for a demonstration in one of our labs.

Tabulate cpu times for the same data for different algorithms. Compare them to the theoretical complexity of each algorithm, and between the algorithms. You can divide the tabulated times by the values of complexity function, and thus approximating a "constant" hidden in O-notation.

Hints

References