Project 1 - A Word Search Program

Minimal Effort Due:  Friday, December 18, 2009 at 11:59pm
Final Project Due:  Saturday, January 09, 2010 at 11:59pm

Overview

In this project, you will write a program that solves word search puzzles.  A word search puzzle consists of a grid containing a number of characters and a list of words.  Some of the words in the words list are hidden in the puzzle.  This program will search the puzzle for the words in the list and will print the location of any words in the word list that are hidden in the puzzle grid.

Objectives

  1. Study the design of a complete software system.
  2. Implement a program given the program's requirements, specifications, and high-level design.
  3. See what it is like to plan your development over a longer period of time.
  4. Gain further experience working with data structures in Java.
  5. Obtain additional experience writing programs in Java.
  6. Gain more experience writing Java classes.

Summary and Activity Links

  1. Read the program requirements.
  2. Understand the program specification.
  3. Study the program design.
  4. Get the files you need to complete the project.
  5. Implement the design.
  6. Submit your project.

Getting Help

You may get help from your instructor(s) and the teaching assistants.  Violations of the Code of Conduct can result in suspension, expulsion and even criminal charges. Violations include, but are not limited to:

For details refer to the DCS Policy on Academic Integrity

We certainly do not expect there to be absolutely no communication between the students of this class; people tend to learn well that way. If you have any doubt if what you would like to do might violate our rules, please see your lecture instructor for clarification.

Program Requirements

You are to write a program that solves word search puzzles.  A word search puzzle consists of two items; a puzzle and a list of strings.  The puzzle consists of a grid of characters, such as the one shown below:

TYMANN
ULOTPA
XWDZZE
ASILPJ
WESSDO

Hidden in the grid of characters are a number of strings.  A string may start at any letter, but must proceed along cells that are consecutively connected by straight lines (either horizontally, vertically, or along a 45 degree diagonal).  For example, the string TYMANN starts at the T in the upper left hand corner, moves to the right to the Y, moves to the right to the M, and so on.  In forming a single string you may use each letter exactly once, and once you start moving in a direction you must continue moving in that direction until you reach the end of the word.  Strings may wrap around the grid and the characters in the string do not always go from left to right or from top to bottom or from the left corner to the lower right.

You can find the following strings:  LISA, MOD, PAUL, PJ, TOP, and TYMANN hidden in the puzzle given above.  The location of these strings is shown in the puzzle below:

TYMANN
ULOTPA
XWDZZE
ASILPJ
WESSDO

Note that a single character may be used to form two different strings (such as the T in TYMANN and TOP, or the M in TYMANN and MOD, or the P in TOP and PJ).  It is possible that a given string may be hidden twice in a puzzle, in which case the program only needs to identify one of the occurrences of the string (it does not matter which one is reported). 

The second part of the word search is a list of strings that the program is to look for in the puzzle.  Consider the list of strings given below:

BUSTER
LISA
PAUL
PAULA
PROJECT TWO
TYMANN
TOP

The puzzle contains the strings:  PAUL, LISA, TYMANN, and TOP.  The strings BUSTER, PAULA, and 'PROJECT TWO' are not in the puzzle.  If a string contains a space, such as, 'PROJECT TWO', the space is ignored when searching the puzzle. Case is also ignored when searching for strings in a puzzle (i.e. Paul, PAUL, and PAul are all considered the same string).

The output from your program will consist of the strings in the string list that are hidden in the puzzle and their locations.  The location of a string is given by the coordinates of the first and last letters in the puzzle.  Coordinates are always written so that the row of the letter is given first, followed by the column.  The coordinate of the letter in the upper left hand cell of the puzzle is (0,0).  The following output was generated from the puzzle and strings given above:

LISA [(3,3) (3,0)]
PAUL [(1,4) (1,1)]
TYMANN [(0,0) (0,5)]
TOP [(0,0) (3,4)]

When the program detects errors in usage or in processing the input, an appropriate error message should be generated and the program will terminate.

This is worth re-mentioning because it is very important. Unlike normal search puzzles, this program can handle words that wrap around the boundaries of the puzzle grid. For example the following puzzle:

AxxRxI
xxxTDI
ORIxDH
012345
xG1Vxx
AGxExx

with this list of words:

HORI
VERT
DIAG1
DIAG2
012345
0123450

would find the following words:

HORI [(2,5) (2,2)]
VERT [(4,3) (1,3)]
DIAG1 [(2,4) (4,2)]
DIAG2 [(1,4) (3,2)]
012345 [(3,0) (3,5)]

Make sure you understand how words wrap around diagonally! Also remember that each cell in the puzzle is only allowed to be used once when matching a word.

Program Specification

Overview

The input to your program will consist of two files; one containing the puzzle, and the second containing the search strings.  The names of these files will be the same, but their extensions will be different.  The puzzle file will have the extension '.pz' and the file containing the strings will have the extension '.wl'.

Starting the Program

The program is started from the command line by typing the name of the program followed by the name of the data files, as shown below:

java WordSearch cities

This command will read the puzzle from the file cities.pz and the strings from the file cities.wl.  If the program is not invoked correctly, the following error message will be printed to standard error and the program will be terminate without generating any other output:

Usage:  java WordSearch filename

The program will then attempt to open the puzzle file followed by the file containing the search strings.  If a file cannot be opened, the following message will be printed to standard error and the program will terminated without generating any other output:

WordSearch:  cannot open filename-with-extension

where you will substitute the complete name of the file that could not be opened (i.e., either cities.pz or cities.wl in the example above).

File Formats

Both the puzzle file and word list files are plain text files.  The first line in the word list file is a number indicating the number of words in the file.  Each line after that contains a string that is to be searched for in the puzzle (click here to see a sample word list file).

If the user specifies a word list file that does not conform to the specification given above (i.e. the word count is not present or is wrong), the following message will be printed to standard error and the program will be terminated without generating any other output:

WordSearch:  invalid word list file

If the word list is correct but contains zero words (i.e. a file with just the number 0 in it), you should treat this as a valid word list and print out the puzzle.

The first line in the puzzle file is a number indicating the number of rows, R, in the puzzle, the second line indicates the number of columns, C, in each row (both R and C must be greater than 0).  The remaining R lines in the puzzle file, each containing strings with C characters each, are the rows in the grid that makes up the puzzle.  Only letters or digits may appear in the puzzle (i.e. those characters for whom Character.isLetterOrDigit() returns true).  Click here to see a sample puzzle file.

If the user specifies a puzzle file that does not conform to the specification given above (i.e. the number of rows and columns cannot be determined, or is wrong) , the following message will be printed to standard error and the program will be terminated without generating any other output:

WordSearch:  invalid puzzle file

During the execution of the program, if an IO error should occur while reading either file, the following message will be printed to standard error and the program will be terminated without generating any other output:

WordSearch:  error reading filename-with-extension

Output

The output from your program, which will be printed to standard output, will list the puzzle, a blank line, the search strings, and a blank line.  This will be followed by the strings that are hidden in the puzzle file, printed in the same order that they appear in the file.  There will be one line of output for each string found in the puzzle.   The line will contain the string, followed by the location of the string in the puzzle.

The output for the puzzle

TYMANN
ULOTPA
XWDZZE
ASILPJ
WESSDO

and the word list

BUSTER
MOD
PAUL
PAULA
PROJECT TWO
TYMANN

is:

TYMANN
ULOTPA
XWDZZE
ASILPJ
WESSDO

BUSTER
MOD
PAUL
PAULA
PROJECT TWO
TYMANN

MOD [(0,2) (2,2)]
PAUL [(1,4) (1,1)]
TYMANN [(0,0) (0,5)]

Please note that your output must not be tab indented.

Program Design

Overview

The word search program consists of 4 classes:  WordSearch, Location, Puzzle, and PuzzleFileFormatException.

The WordSearch class is the main class that drives the program.  It is responsible for processing the command line arguments, creating a puzzle object, and creating the output.  A brief description of the remaining classes is given below:

Program Structure

The structure of the word search program is illustrated using the UML class diagram below (click here to see the Javadoc pages for these classes).  Make sure that you understand the role each class plays in the program and the behaviors that the classes provide.

uml.png (43231 bytes)

Finding Words

There are many different ways to find a word in a puzzle. The simplest approach is to search all cells for a word, and all potential combinations for a cell, regardless of whether or not the cell in question contains the first letter of the word that is being searched for.

loop over the rows from first to last
    loop over the columns from first to last
        search for the word at that starting cell in all directions

return the location of the first search that finds the word

You are free to use any search method you want for this project. You are also free to add other private data members and methods to your classes to help you accomplish the task. It is a good idea to write a separate method in the Puzzle class which searches for an individual word in the puzzle from a specific direction.

Getting the Files You Need

The files that you need to complete this project are contained in the jar file located here.  Download the jar file and unpack it in an appropriate location in your account.

After you have unpacked the jar file you should see that a directory named project1 has been created.  There are two directories contained in project1code and data

The directory, project1/data, contains the data files that your program will be tested against when you submit it.  The files named file.pz contain the puzzle files, and the file.wl files contain the corresponding word lists (you will substitute the actual names of the test cases for the word file).  The files named file.ans contain the expected output corresponding to the input files.  For example, if you type the following command:

java WordSearch ../data/cities > cities.out

The word search program will run, taking its input from the files cities.pz and cities.wl, and sending the output to the file cities.out.  It is now possible to compare the contents of cities.out to cities.ans:

diff cities.out ../data/cities.ans

If there are no differences, the program produced the correct output.

The directory project1/code is empty and it is expected that you will place your source files for the various classes in this program, in that directory.

Implementation: Write Your Code

DO NOT write any code until you understand what the word search program does, the design of the program, and the output that it produces.   Writing code before understanding the program will simply be a waste of your time, and will result in a program that does not work!!

As with any large project, it is probably a good idea to finish the easy parts of the project first and then slowly complete the harder parts.  After studying the project description you should realize that the Location and PuzzleFileFormatException classes are probably the easiest parts of this project, and the find() method in the Puzzle class is probably the hardest.  The code in the WordSearch class is not that hard to write, just a little tedious given all the files and exceptions that you need to keep track of.

Start working on the parts of this program that you feel are the easiest ones, and work your way to the hardest ones.  As you implement the individual pieces of your program be sure that the program compiles without errors.  You should also test each piece of the program as it is completed.  This may require that you create special test files or add code for testing the intermediate functionality. 

Develop your code in small pieces, and test each part of your code as you write it.  It you wait until you have all the code written to start testing, it will be almost impossible to locate and correct errors in your program.

Writing the Location class is probably a good place to start.  Part of your grade will be based on how you use RCS, so make sure that you check in your code after completing substantial changes to it.

Project Submission and Grading

Minimal Project Submission

In order to satisfy the minimal reasonable effort requirements for this project ( see CS2 syllabus for details), your program must successfully compile, handle the command line arguments, load the puzzle file into a Puzzle object, and produce output consisting of:

  1. The puzzle
  2. A blank line
  3. The word list
  4. A blank line

The final version of your program will add the strings found in the puzzle and their locations to the end of the output specified above.

When you are ready to submit your program, execute the following command in the directory where your code is stored:

try grd-232 project1-min WordSearch.java Puzzle.java Location.java PuzzleFileFormatException.java

The first 8 tests that will be executed by try check to see if your submission statifies the minimum reasonable effort criteria.  The results of these first 8 tests will be displayed as try checks your program.  The output, shown below, will be generated by try if your program meets the minimum reasonable effort criteria:

===== Test 1
Your output is correct!
Your standard error output is correct!

===== Test 2
Your output is correct!
Your standard error output is correct!

===== Test 3
Your output is correct!
Your standard error output is correct!

===== Test 4
Your output is correct!
Your standard error output is correct!

===== Test 5
Your output is correct!
Your standard error output is correct!

===== Test 6
Your output is correct!
Your standard error output is correct!

===== Test 7
Your output is correct!

===== Test 8
Your output is correct!

Files being saved:
Location.java Puzzle.java PuzzleFileFormatException.java WordSearch.java

project1 has been submitted.

If your program fails a minimum reasonable effort test, try will display a brief message indicating the general nature of the problem and will terminate. In order to receive credit for the project, your program must pass try by the deadline for minimal submission.

Final Project Submission

Once you complete the minimal effort submission, you should make a complete RCS version for it. You can then start working on finding words in the puzzle and printing them out. Once you do this, do not submit to the minimal effort target. You will get incorrect results for some tests because of the extra output.

The target for the final project submission is:

try grd-232 project1 WordSearch.java Puzzle.java Location.java PuzzleFileFormatException.java

To receive full credit, you must submit the project by the final project deadline.

When you feel that you have reached an important milestone in the program's development, submit it. This way if you fail to get any further before the due date at least you will have submitted something.

Remember to submit early, and often. The labs get very busy the night a project is due! Although you may start to run try before midnight, it may not finish before midnight. The best strategy is to finish before the labs get crazy, then you can sit back and watch the others panic.

Project Grade

The project grade will be computed as follows:

  1. Your program is graded first for correctness. After checking the program for correct output, ...
  2. Your program's algorithms and implementation are graded; if they are unclear or low quality, you will lose points.
  3. Your is graded for style and coding standards. Violating programming standards and incorrect use of RCS will cost points also.

The distribution of grade percentages between the minimal and final submissions is 30-70. Failure to submit the minimal submission will lose 30% of the total grade, and a missing final submission will lose 70%.

A program that produces wrong answers will get a low grade regardless of the quality of its algorithms and how well it adheres to standards.