Alan Kaminsky Department of Computer Science Rochester Institute of Technology 4486 + 2220 = 6706
Home Page
Operating Systems I 4003-440-02 Winter Quarter 2012
Course Page

4003-440-02 Operating Systems I
Programming Project 1

Prof. Alan Kaminsky -- Winter Quarter 2012
Rochester Institute of Technology -- Department of Computer Science

Overview
Search Engine Log Analysis
Software Requirements -- Clarifications
Software Design
Submission Requirements
Grading Criteria
Late Projects
Plagiarism
Resubmission


Overview

Write a Java program that uses multiple threads to learn about thread creation, execution, and termination.


Search Engine Log Analysis

Privacy Invaders, Inc. runs PI, the best search engine in the world. The company has a server farm with dozens of web servers that respond to search requests from users all over the Internet. Each web server keeps a log of search requests the server has processed. Each log is a plain text file. Each line of the file contains the following information about one request: date in the format YYYY/MM/DD, time in the format HH:MM:SS, user's IP address in dotted decimal notation, and a list of one or more words that are the search terms the user specified. Each information item is separated from the next by a space character.

Here is an example of a log file named log01.txt:

2012/11/29 13:00:42 1.1.2.3 bill gates
2012/11/29 13:00:45 2.3.5.8 steve jobs
2012/11/29 13:01:02 3.5.8.13 car bomb taliban
2012/11/29 13:05:56 3.5.8.13 nuclear weapon iran
2012/11/30 09:45:16 5.8.13.21 nutcracker
2012/11/30 10:14:07 8.13.21.34 sports car
2012/11/30 10:14:09 3.5.8.13 hezbollah bomb car
Here is an example of a log file named log02.txt:
2012/11/29 13:00:32 1.1.2.3 paul allen
2012/11/29 13:02:17 3.5.8.13 car bomb al qaeda
2012/11/30 10:14:17 8.13.21.34 nissan car dealer rochester ny
2012/11/30 10:15:21 3.5.8.13 iraq wmd

Privacy Invaders, Inc. sells a service that lets customers analyze the server logs. The customer specifies a list of one or more search terms. Privacy Invaders, Inc. then analyzes each log file and prints a report for each log file. The first line of the report is the name of the log file. The rest of the report contains one line for each user that did a search containing all the search terms the customer specified. "User" = unique IP address. A search term the customer specified must exactly match a search term in the log file, except uppercase or lowercase does not matter. Each line contains the following information about one user: IP address in dotted decimal notation, a space character, and the number of searches the user performed that contained all the specified search terms. The users are listed in ascending order of IP address.

Here are the reports for the above log files if the customer specifies the search term "car":

log01.txt
3.5.8.13 2
8.13.21.34 1
log02.txt
3.5.8.13 1
8.13.21.34 1

Here are the reports for the above log files if the customer specifies the search terms "car bomb":

log01.txt
3.5.8.13 2
log02.txt
3.5.8.13 1

Note: Ascending order of IP address is defined as follows. The dotted decimal IP address a.b.c.d, where a, b, c, and d are integers in the range 0 to 255, is converted to a number using this formula:

a⋅224 + b⋅216 + c⋅28 + d
The result is a number in the range 0 to 232−1. Ascending order of IP addresses is ascending order of these numbers. Beware; in Java, type int can only store numbers in the range −231 to 231−1.

Programming Project 1 will calculate and print a separate report for each of several log files in multiple threads.


Software Requirements

  1. The program must be run by typing this command line:
    java Analyze <searchterms> <filename> ...
    
    1. <searchterms> is a list of one or more search terms.
    2. If there are two or more search terms, each search term is separated from the next by a plus sign.
    3. <filename> is the name of a plain text document file.
    4. There must be one or more file names.

    Note: This means that the program's class must be named Analyze, and this class must not be in a package.

    Note: Here are some example command lines:

    java Analyze car log01.txt log02.txt
    java Analyze car+bomb log01.txt log02.txt log03.txt
    

  2. If the command line does not have the required number of arguments, the program must print an error message on the console and must terminate. The wording of the error message is up to you.

  3. If any log file cannot be read, the program must print an error message on the console and must terminate. The wording of the error message is up to you.

  4. For each log file, the program must print on the console the analysis report for the log file in the format specified above.

  5. For each log file, the lines of the analysis report must not be intermingled with the lines of any other log file's analysis report.

  6. The order in which the log files' analysis reports appear in the output does not matter, so long as all the analysis reports do appear.

  7. The program must not print any blank lines and must not print any output not specified in the above requirements.

Clarifications to the Requirements

  1. Q: What is supposed to happen if the user specifies no search terms, e.g. java Analyze log01.txt log02.txt?
    A: The first command line argument is always the list of search terms. In the above command, the search term is "log01.txt".

  2. Q: What is supposed to happen if there are no results for a given log file?
    A: Only the log file name is printed.

  3. Q: Can we assume that the IP addresses are valid?
    A: Yes.

  4. Q: What is supposed to happen if the user specifies the same file name twice?
    A: The file gets analyzed twice, and two printouts for the file appear in the output.

  5. Q: What is supposed to happen if one of the log files does not exist but the other log files do? Is the program supposed to stop, or is the program supposed to ignore the missing log file and analyze the other log files?
    A: In this case, the program should only print an error message, it should not process any log files.

  6. Q: What if the user is searching for a string with a "+" character in it? Is that allowed?
    A: The search terms will not include plus signs.

  7. Q: What is the program supposed to do if a search term is repeated, e.g. java Analyze car+car log01.txt log02.txt?
    A: The program should treat this as though there were only one occurrence of the repeated search term.

  8. Q: If a log file is located in a separate directory, should the program print only that file's name, or the relative paths to the file?
    A: The program should print whatever name was given as the command line argument.


Software Design

  1. The program must consist of multiple threads, where each thread processes one and only one log file.

  2. The program's main thread must do no work except to parse the command line arguments and create the aforementioned analysis threads, and only those threads.

  3. The program must follow the thread programming patterns studied in class, be designed using object oriented design principles as appropriate, and make use of reusable software components as appropriate.

  4. Each class or interface must include a Javadoc comment describing the overall class or interface. Each method within each class or interface must include a Javadoc comment describing the overall method, the arguments if any, the return value if any, and the exceptions thrown if any.

  5. The program must follow a consistent and readable coding style.


Submission Requirements

Your project submission will consist of a Java archive (JAR) file containing the Java source file for every class and interface in your project. Put all the source files into a JAR file named "<username>.jar", replacing <username> with the user name from your Computer Science Department account. The command is:

jar cvf <username>.jar *.java

If your program uses classes or interfaces from the Computer Science Course Library without changes, then you do not need to include these classes' or interfaces' source files in your JAR file. If your program uses classes or interfaces from the Computer Science Course Library with changes, then you do need to include these classes' or interfaces' source files in your JAR file.

Send your JAR file to me by email at ark­@­cs.rit.edu. Include your full name and your computer account name in the email message, and include the JAR file as an attachment.

When I get your email message, I will extract the contents of your JAR file into a directory. However, I will not replace any of the source files in the Computer Science Course Library with your source files; your project must compile and run with your files in their own separate directory. (You can do this project without needing to replace any source files in the Computer Science Course Library.) I will set my Java class path to include the directory where I extracted your files and the directory where the Computer Science Course Library is installed. I will compile all the Java source files in your program using the JDK 1.6.0 compiler. I will then send you a reply message acknowledging I received your project and stating whether I was able to compile all the source files. If you have not received a reply within one business day (i.e., not counting weekends), please contact me. Your project is not successfully submitted until I have sent you an acknowledgment stating I was able to compile all the source files.

The submission deadline is Wednesday, December 12, 2012 at 11:59pm. The date/time at which your email message arrives in my inbox (not when you sent the message) will determine whether your project meets the deadline.

You may submit your project multiple times before the deadline. I will keep and grade only your most recent submission that arrived before the deadline. There is no penalty for multiple submissions.

If you submit your project before the deadline, but I do not accept it (e.g. I can't compile all the source files), and you cannot or do not submit your project again before the deadline, the project will be late (see below). I strongly advise you to submit the project several days before the deadline, so there will be time to deal with any problems that may arise in the submission process.


Grading Criteria

I will grade your project by:

When I run your program, the Java class path will point first to the directory with your compiled class files, followed by the directory where the Computer Science Course Library is installed. I will use JDK 1.6.0 to run your program.

I will grade the test cases based solely on whether your program produces the correct output as specified in the above Software Requirements. Any deviation from what is specified will result in a grade of 0 for the test case. This includes errors in the formatting (such as extra spaces, missing spaces, the use of a tab instead of a space), incorrect upper/lowercase, incorrect punctuation, misspelled words, missing output, and extraneous output not called for in the requirements. The requirements state exactly what the output is supposed to be, and there is no excuse for outputting anything different. If any requirement is unclear, please ask for clarification.

If there is a defect in your program and that same defect causes multiple test cases to fail, I will deduct points for every failing test case. The number of points deducted does not depend on the size of the defect; I will deduct the same number of points whether the defect is 1 line, 10 lines, 100 lines, or whatever.

After grading your project I will put your grade and any comments I have in your encrypted grade file. For further information, see the Course Grading and Policies and the Encrypted Grades.

The log files used to grade the test cases are:
log01.txt
log02.txt
log03.txt
log04.txt


Late Projects

If I have not received a successful submission of your project by the deadline, your project will be late and will receive a grade of 0. You may request an extension for the project. There is no penalty for an extension. See the Course Policies for my policy on extensions.


Plagiarism

Programming Project 1 must be entirely your own individual work. I will not tolerate plagiarism. If in my judgment the project is not entirely your own work, you will automatically receive, as a minimum, a grade of zero for the assignment. See the Course Policies for my policy on plagiarism.


Resubmission

If you so choose, you may submit a revised version of your project after you have received the grade for the original version. You are allowed to make one and only one resubmission of the project. However, if the original project was not successfully submitted by the (possibly extended) deadline or was not entirely your own work (i.e., plagiarized), you are not allowed to submit a revised version. Submit the revised version via email in the same way as the original version. I will accept a resubmission up until 11:59pm Tuesday 08-Jan-2013. I will grade the revised version using the same criteria as the original version, then I will subtract 2 points as a resubmission penalty. The revised grade will replace the original grade, even if the revised grade is less than the original grade.

Operating Systems I 4003-440-02 Winter Quarter 2012
Course Page
Alan Kaminsky Department of Computer Science Rochester Institute of Technology 4486 + 2220 = 6706
Home Page
Copyright © 2012 Alan Kaminsky. All rights reserved. Last updated 18-Dec-2012. Please send comments to ark­@­cs.rit.edu.