Lab 10: Hashing

Copyright RIT 2008
$Id: writeup.xml,v 1.12 2009/05/01 18:28:10 vcss233 Exp $


Goal

You must submit this lab by the MIDNIGHT OF SUNDAY AT THE END OF WEEK 10. This applies to students in ALL lab sections of the course.

This lab will be done in TEAMS of two students each. Your lab instructor will form teams for this lab and will tell you which team you are on when you arrive for lab. One member of each team is responsible for submitting the work for the team, but all members of the team must include their names in all code files that are submitted.

  1. Complete the code for a hash table implementation that uses a simple hashing algorithm and linear probing to resolve collisions.
  2. Provide a test program that tests your hash table implementation.

Overview

The purpose of this lab is to complete a hash table implementation. We use an array to hold the hash table. The hash function takes a key, applies the hashCode method found in the Object class, and takes the result mod the array size as the initial hash address. Collisions are resolved by using linear probing. In this implementation, we avoid having to keep track of whether we have checked every possible location in the hash table because we do not allow the hash table to ever become full. When we reach a threshold value (provided as a constant in the code supplied to students), we automatically expand the hash table by essentially doubling its size. At this point, we re-hash all valid entires into the new hash table. This formulation of the problem also allows a user to remove entries from the hash table. When entries are removed, they are marked with a special indicator. Whenever a hash table is resized, any deleted entries are then discarded and not stored in the expanded table.

Pre - Lab Work

  1. Review your course notes on hashing.

  2. Read Goodrich and Tamassia, Chapter 9, Section 2

In-Lab Activities

There is only one activity for this lab.

Activity #1 : Complete a Hash Table Implementation and Test It

Download a copy of the jar file: lab10.jar (http://www.cs.rit.edu/~vcss233/pub/lab10/Binaries/lab10.jar) that contains the file you need to complete this lab and unpack it in your CS3 lab subdirectory. In the lab10 directory, you will find an RCS directory that contains the files SimpleHashMap.java and OpenHashTable.java. The file SimpleHashMap.java contains an interface definition for a simplified hash map. The interface is generic meaning that both keys and values stored in the hash map may be whatever type of object a user chooses. The file OpenHashTable.java contains a partial implementation of a class that implements this interface. Instance variables, including the array that serves as the hash table container, have already been entered, as well as an inner class that is used to hold key/value pairs that make up the entries stored in the hash table. There are also several constants available for your use. We set the initial capacity of the hash table to an unrealistically small value, but this is done so that you can easily develop tests of your implementation, especially for the case when expanding the table size is warranted. Finally, all of the methods that you are expected to complete have been stubbed out and entered into the file. Most of the methods are directly related to the interface that this class supports, but there are other methods as well. There are extensive comments included throughout the class. In addition, we've provided Javadoc descriptions for the main classes involved; namely, SimpleHashMap, OpenHashTable, OpenHashTable.Entry, and TestOpenHashTable. Notice that the Javadoc pages reveal both public and private information so you can see it all. We've included the Javadoc for the inner class, even though you don't have to write it, just so you have a convenient way of seeing this class and we've included the Javadoc for the test program that you will write. Since you will be making changes to the code in OpenHashTable.java, be sure to use RCS to maintain a record of these changes. In addition, when you create your test program, be sure to check that into RCS as well. In order to obtain full credit for this activity you must submit working programs and use RCS correctly.

While we often suggest in labs and projects a process for you to follow in completing your work, this lab offers an excellent opportunity for you to practice "incremental coding and testing". Below is additional information concerning most of the methods that you need to complete. Consider trying an approach in which you code one method at a time, then add just a little bit more code to your test program, then try running your test program. Our first suggestion is for you to fill in a debugging routine that prints the contents of the hash table. Once you have that done, you can write a test program to construct a hash table and display the initial contents. It should appear to be empty, and you can see these results from using your debugging routine. The whole point in having a debugging routine is that you can use it throughout your development effort to confirm that more substantial methods are working correctly. Another nice aspect of this lab is that if you understand hashing, you can predict what the hash table should look like under different sets of operations. Having a working debugging routine gives you a way of confirming that your code is working correctly. You've probably got the idea now.

For the rest of the methods you write, you may want to test after you have written one or two methods.

  1. You should start by filling in the code for the printTable method. This method is used for debugging and helps you see the contents of the hash table at any point in time. You should, of course, familiarize yourselves with the interface defined in SimpleHashMap.java and the already completed parts of OpenHashTable.java.
  2. The find method is needed by many others; you should complete this method next. The algorithm for this method is:
    1. Set the index to the result of first applying the hashCode method to the key and then taking the result mod the length of the array.
    2. If the resulting index value is negative, then add the length of the array to it.
    3. While array[index] is not empty and the key is not at this position in the array, do the following two steps: (1) increment the index by 1, and (2) reset the index to 0 if the new value of the index is now greater than or equal to the length of the array.
    4. Go back and check the loop termination condition again.
    5. When the loop terminates, the method is done, so return the value of index.
  3. The put method is next because you want to be able to store items in the hash table. The algorithm for this method is:
    1. Find the first array element that is empty or find the array element that contains the key. We can make use of the find method.
    2. If an empty array location is found, then do the following three steps: (1) insert the new entry and increment numKeys, (2) check to see if the table needs to be expanded (you would call rehash if it is needed; the way to check is to compute the percentage of the available array that is taken up by either actual keys or deleted keys and then see if this percentage exceeds the threshold defined by the constant at the beginning of the class), and (3) return null.
    3. If the key is found, then replace the value associated with this entry in the array and return the old value.
  4. Developing the remove method will also make use of the find method. The algorithm for this method is:
    1. Find the first array element that is empty or find the array element that contains the key.
    2. If an empty array location is found, then simply return null.
    3. If the key is found, then remove this entry from the array by setting it to reference DELETED, increment numDeletes, and decrement numKeys.
    4. Return the old value associated with this key.
  5. Finally, we can also make use of the find method in developing get. The algorithm for this method is:
    1. Find the first array element that is empty or find the array element that contains the key.
    2. If the array element found contains the key, then return the value associated with this entry; otherwise, return null.
  6. The rehash method is the last method we'll discuss in any detail. The algorithm for this method is:
    1. Create a local array whose size is twice the size of the current hash table plus 1 (this ensures that the resulting size is always odd; although we're not insisting on this in this lab, many experts recommend that the size of the hash table should always be a prime number). You should look at the provided constructor for specifics on how to build and initialize the new table.
    2. Reset numKeys and numDeletes to 0.
    3. Using the same approach you've used elsewhere, reinsert non-deleted entries from the current hash table into the expanded hash table. Any entries that had been marked as deleted are discarded and not reinserted.
  7. When you reach this point, there are just a few, simple methods left for you to complete. Make sure you do complete these methods and include them in your plans for testing.

How To Submit

When you are sure that your hash table implementation is working properly and that your test program sufficiently tests all features of your implementation, submit your work by executing the following command:


   try grd-233 lab10-1 OpenHashTable.java TestOpenHashTable.java
                


Grade Computation

Grade Breakdown: