| CIT
594 Hash Tables Spring 2006, David Matuszek |
In this assignment you are to reinvent the wheel implement your own
version of a bucket hash. You will measure how fast it performs, both in an
absolute sense and in a Big-O sense, and you will compare it against a Sun-provided
hash table class.
A bucket hash, you should recall, is a hash table that consists of an array of linked lists (buckets). Your hash code tells you which list an item belongs in, and you insert or retrieve the item from that list.
Create the following classes:
public class HashTable implements Map
static class Pair -- a static inner class of HashTable;
holds a key and a valuestatic class Cell -- a static inner class of HashTable;
holds a value and a nextstatic class List -- a static inner class of HashTable;
holds a header to a linked list of Cellspublic class ListTest extends TestCasepublic class HashTableTest extends TestCasepublic class Thing -- a class of objects to use as test datapublic class HashTableTimer -- used to time your HashTable
classHere's the overall structure:
HashTable will be an array of a reasonable size, preferably
a prime number (101 is good).
List.
List will contain a header to a linked
list of Cells.
Cell will contain, in its value
field, a Pair.
Pair will contain a key and
a value, both of which are ObjectsTo put something in your hash table, you call myHashTable.put(key, value).
This method should (1) get the hashCode() of the key,
(2) use this hash code to find an index into the array, and (3) search the list
at this index location for a Cell containing a Pair
with a matching key field. If you find such a Pair,
then (4a) replace the value field in the Pair with
the new value; but if you don't find such a Pair,
then (4b) create a new Cell containing a new Pair
and add it to this List.
Retrieving something from your hash table will require some of the same operations.
One slightly tricky bit is that there is no use for the classes Pair,
Cell, and List outside of the HashTable
class, so it's bad style to expose these to the user (especially List,
which will be an incomplete implementation). For this reason, they should be
inner classes of HashTable. Ideally, they should be private
inner classes, but this gets in the way of testing List, so give
them default (package) access. Finally, to avoid some tricky syntax, they should
be static inner classes; this lets you refer to them as HashTable.Pair,
HashTable.Cell, and HashTable.List.
Start with the provided code, which contains:
public class HashTable implements Map -- You need to implement
most of this
static class Pair -- I've implemented this for youstatic class Cell -- I've implemented this for youstatic class List -- You need to implement thispublic class Thing -- I've implemented all but one method for
youI've marked with TODO flags the things you need to add to these classes. (In Eclipse, select Window → Show View → Other... → Basic → Tasks to get tab that shows a list of these.) Of course, you should delete these flags when you have completed the indicated tasks.
public class ThingYour HashTable class should, of course, be able to use (almost)
any kind of Object as a key, and any kind of Object
as a value. I'd like you to use Things for your testing, both as
keys and as values.
The Thing class is almost complete, but lacks a hashCode()
method. Look up java.lang.Object.hashCode() in the Java API, and
be sure you satisfy the "general contract." Also be sure you are actually
overriding the corresponding inherited methods in both cases (use the @Override
tag). No other methods are needed; we are only going to use this class to test
out the HashTable class.
static class List
(inside HashTable) Develop the List class next, since the HashTable
class depends on it.
Your List class has a single instance variable, Cell header,
which references (points to) the first Cell in the list. If the
list is empty, header is null.
You need the following methods:
public void add(Pair pair)public Pair find(Object key)public int size()public Set keySet()As usual, I recommend that you start by writing the JUnit tests for these methods.
public class HashTable implements java.util.MapThis class will implement an interface that declares lots of methods. Let Eclipse
add the required methods for you (Add unimplemented methods). You
will only implement a few of these methods: get, put,
size, clear, isEmpty, keySet,
and equals. Leave the rest as stubs. Add Javadoc comments to all
public methods and instances; for the unimplemented methods, just say they're
unimplemented.
The generated code does not use generics. For this assignment, that's OK.
Write JUnit tests for the following methods (let Eclipse generate the test stubs for you):
get,put,size,clear,isEmpty,keySet, andequals
If you are not clear about what these methods should do, see the Java API for
Map.
The methods get and put are fundamental to all hash
tables. The methods size, clear, and isEmpty
are simple to write and convenient to have. The keySet method returns
a Set, and a Set has an iterator() method,
which you can use when you write equals. And equals
will be very useful in your testing.
In your testing, I recommend the following: In addition to your HashTable,
create a java.util.HashMap. Whatever you do (get, put, etc.) to
your HashTable, also do the the HashMap, then test
if the results are equal (if you override boolean equals(Object),
JUnit's assertEquals method will use it).
Do your testing with Things for both keys and the values.
public class HashTableTimerWhen you have your HashTable class working, time how long it takes to do the
get and put methods.Time these separately.
Here's how you time code:
long startTime = new GregorianCalendar().getTimeInMillis(); // Coded to be timed goes here long endTime = new GregorianCalendar().getTimeInMillis(); long elapsedTime = endTime - startTime;
Since any given execution should take far less than a millisecond, the code
you are timing should be a loop that does numerous calls to put
(or get). Do multiple timings, with differing numbers of calls,
so that you can see how time increases as you add more and more things to your
hash table (I used various values from 100 to 100000). Print out your results.
Since you are trying to time particular methods, the less additional work done
within the timed part, the better. In particular, you should create an array
of Things to use before you enter the loop.
If you carefully follow the DRY principle, your timer method should be quite short (Mine is <70 lines, including comments). You don't need lots of loops.
After you have timed your own get and put methods, time the same methods of
java.util.HashMap. Since both HashTable and HashMap
are Maps, you can use the same timer method for both, as long as
it takes a Map as a parameter.
Your HashTable methods will be much slower than the HashMap
methods. That's OK.
Write up your results, including your timing measurements. Try to answer the following questions:
put and get methods ought
to be?put
and get methods really? Do the results agree with what
you think they should be?put
and get methods in HashMap?HashMap is using the same basic algorithm (bucket
hash) as your HashTable?Note: Your writeup will be part of your grade on this assignment.
Please zip up all the files in this project, and submit them via Blackboard by midnight Thursday, February 16.