CIT 594 Hash Tables
Spring 2006, David Matuszek

Purposes:

General Idea:

In this assignment you are to reinvent the wheel implement your own version of a bucket hash. You will measure how fast it performs, both in an absolute sense and in a Big-O sense, and you will compare it against a Sun-provided hash table class.

A bucket hash, you should recall, is a hash table that consists of an array of linked lists (buckets). Your hash code tells you which list an item belongs in, and you insert or retrieve the item from that list.

Details:

Create the following classes:

Here's the overall structure:

To put something in your hash table, you call myHashTable.put(key, value). This method should (1) get the hashCode() of the key, (2) use this hash code to find an index into the array, and (3) search the list at this index location for a Cell containing a Pair with a matching key field. If you find such a Pair, then (4a) replace the value field in the Pair with the new value; but if you don't find such a Pair, then (4b) create a new Cell containing a new Pair and add it to this List.

Retrieving something from your hash table will require some of the same operations.

One slightly tricky bit is that there is no use for the classes Pair, Cell, and List outside of the HashTable class, so it's bad style to expose these to the user (especially List, which will be an incomplete implementation). For this reason, they should be inner classes of HashTable. Ideally, they should be private inner classes, but this gets in the way of testing List, so give them default (package) access. Finally, to avoid some tricky syntax, they should be static inner classes; this lets you refer to them as HashTable.Pair, HashTable.Cell, and HashTable.List.

Start with the provided code, which contains:

I've marked with TODO flags the things you need to add to these classes. (In Eclipse, select Window Show View Other... Basic Tasks to get tab that shows a list of these.) Of course, you should delete these flags when you have completed the indicated tasks.

public class Thing

Your HashTable class should, of course, be able to use (almost) any kind of Object as a key, and any kind of Object as a value. I'd like you to use Things for your testing, both as keys and as values.

The Thing class is almost complete, but lacks a hashCode() method. Look up java.lang.Object.hashCode() in the Java API, and be sure you satisfy the "general contract." Also be sure you are actually overriding the corresponding inherited methods in both cases (use the @Override tag). No other methods are needed; we are only going to use this class to test out the HashTable class.

static class List (inside HashTable)

Develop the List class next, since the HashTable class depends on it.

Your List class has a single instance variable, Cell header, which references (points to) the first Cell in the list. If the list is empty, header is null.

You need the following methods:

public void add(Pair pair)
Adds or replaces a Pair on this List (a list may not contain two pairs with equal keys). If a pair with the same key is already in the list, this method should replace the value in that pair; otherwise, it should just add the pair to this List.
public Pair find(Object key)
Given a key, return the Pair it's in.
public int size()
Returns the number of Pairs in this List.
public Set keySet()
Returns a set of all the keys in this list.

As usual, I recommend that you start by writing the JUnit tests for these methods.

public class HashTable implements java.util.Map

This class will implement an interface that declares lots of methods. Let Eclipse add the required methods for you (Add unimplemented methods). You will only implement a few of these methods: get, put, size, clear, isEmpty, keySet, and equals. Leave the rest as stubs. Add Javadoc comments to all public methods and instances; for the unimplemented methods, just say they're unimplemented.

The generated code does not use generics. For this assignment, that's OK.

Write JUnit tests for the following methods (let Eclipse generate the test stubs for you):

get, put, size, clear, isEmpty, keySet, and equals

If you are not clear about what these methods should do, see the Java API for Map.

The methods get and put are fundamental to all hash tables. The methods size, clear, and isEmpty are simple to write and convenient to have. The keySet method returns a Set, and a Set has an iterator() method, which you can use when you write equals. And equals will be very useful in your testing.

In your testing, I recommend the following: In addition to your HashTable, create a java.util.HashMap. Whatever you do (get, put, etc.) to your HashTable, also do the the HashMap, then test if the results are equal (if you override boolean equals(Object), JUnit's assertEquals method will use it).

Do your testing with Things for both keys and the values.

public class HashTableTimer

When you have your HashTable class working, time how long it takes to do the get and put methods.Time these separately.

Here's how you time code:

long startTime = new GregorianCalendar().getTimeInMillis();
// Coded to be timed goes here
long endTime = new GregorianCalendar().getTimeInMillis();
long elapsedTime = endTime - startTime;

Since any given execution should take far less than a millisecond, the code you are timing should be a loop that does numerous calls to put (or get). Do multiple timings, with differing numbers of calls, so that you can see how time increases as you add more and more things to your hash table (I used various values from 100 to 100000). Print out your results.

Since you are trying to time particular methods, the less additional work done within the timed part, the better. In particular, you should create an array of Things to use before you enter the loop.

If you carefully follow the DRY principle, your timer method should be quite short (Mine is <70 lines, including comments). You don't need lots of loops.

After you have timed your own get and put methods, time the same methods of java.util.HashMap. Since both HashTable and HashMap are Maps, you can use the same timer method for both, as long as it takes a Map as a parameter.

Analyze and write up your results

Your HashTable methods will be much slower than the HashMap methods. That's OK.

Write up your results, including your timing measurements. Try to answer the following questions:

Note: Your writeup will be part of your grade on this assignment.

Due date:

Please zip up all the files in this project, and submit them via Blackboard by midnight Thursday, February 16.