Project 2: Chord505, a Simplified Distributed Hash Table

Administrivia

TaskDeadline
Assigned:Monday, March 22nd, 2004
Choose teams by:Friday, March 26th, 2004
Checkpoint due by: 23:59 PM EDT, Friday, April 9th, 2004
Implementation and final report due by: 23:59 PM EDT, Friday April 16th, 2004

Introduction

You are to design and implement Chord505, a simplified Distributed Hash Table based on Chord. You have been provided with a version of DHT simulator and one of the simplest DHT lookup algorithm implementation, the Ring. This way you can concentrate on designing and implementing your protocol rather than framework issues.

Project description

A DHT lookup algorithm is the foundation of all structured peer-to-peer systems. It provides support for one operation: given a key, it maps the key onto a node. Your goal is to design a scalable lookup algorithm and implement it in the given simulator.

Your main reference should be chord, which provides a relatively simple lookup algorithm. It has the provable scalable property: A lookup only traverses O(log N) nodes. Your design is suggested to be based on chord.

Even with such a reference, a real DHT implementation could be complicated: The nodes can join and leave simultaneously or fail silently; The network could be partitioned. In general, handling byzantine faults is hard. In this project, you do not have to deal with those "churning" effects. In particular, you only have to handle sequential node join and leave. When a node leaves the DHT, it will have the opportunity to notify other relevant nodes about the event.

Doing the project

Students should form programming teams. A programming team should have no more than three people, there will be no exceptions. Notify the TAs by email (cis505@seas.upenn.edu) of your groups. Specify the names and email addresses of the members in your programming team. If you're having trouble forming a group, post a note to the class newsgroup.

Each programming team is required to write a complete implementation of Chord505 DHT. Code sharing between teams is prohibited. If a team contains more than one people, they will receive equal credits whatsoever. (Read as do not complain about your partners.) We do not have design teams in project 2.

We suggest that you stick to the given simulator, but you are free to make any additions that you think improve the simulation process, as long as

  1. you document them clearly
  2. you clear all changes with us before you implement them.

Grading

  1. 30% is based on your submission at the checkpoint. This version of your implementation should support join and, of course, lookup operations. leave and other operations do not have to work at this moment.
  2. 40% is based on the correctness and performance of your final implementation. Part of the test cases is given in the release tarball.
  3. The remaining 30% is based on your final report. You should design your own test cases to evaluate your implementation. The report should read as a workshop paper format. In the report, we expect to see the design and implementations of your Chord505, as well as the evaluations and lessons you have learned from the project.
  4. 15% Extra credits: You can extend the implementation beyond the basic requirement to gain extra credits. Several possible extensions are: (1) Performance enhancement based on network distances. Chord has O(log N) hop performance in the virtual ring. However, the adjacent nodes in the ring may actually be far away in terms of network distance. How to take advantage of network distance information? Read the CFS and i3 papers may give your some ideas. (2) Fault tolerance. The nodes in real life may fail arbitrarily. How does that impact to your DHT implementation? How does network failures impact the DHT as well? (3) Security. What if part of the nodes do not behave according to the protocol, or even maliciously? Other interesting topics are also welcome. Please analysis your new algorithms in a quantitative approach and be explicit about whether the idea is your own or proposed by other people. In the latter case, cite it. Talk to the course staff at least 2 days ahead of the deadline if you think you are doing great stuff and need an extension.
  • Late penalty:within 1 hour 10% off; within 1 day 30% off; within 2 days 60% off. Otherwise, zero score automatically.

    What to submit

    1. team formation date: email your team decision to us, one email per team.
    2. check point due:
      the basic implementation in folder src/ and the compiled classes in folder bin/
      a README file that briefly describes how to run your DHT
      a TEAM.txt file containing your name(s) and Email address(es).
    3. final due:
      the full implementation in folder src/ and the compiled classes in folder bin/
      a README file that briefly describes how to run your DHT
      a TEAM.txt file containing your name(s) and Email address(es).
      a report.{ps,pdf} as your final report. M$ Word format will not be accepted!

    Getting Started

    You can download our implementation of the simulator from here: (chord505-0.3.tar.gz).

    The tarball contains two packages, algorithm and simulator. Your main focus is to implement a Chord505 class that implements the interface algorithm.Protocol. The algorithm.Protocol has following methods:

    methoddescription
    joina new node joins the DHT
    leavea node currently in DHT leaves
    updatenotify events between nodes
    stablizefix broken/inefficient pointers/fingers
    lookupfind a node that is closer to the one that is responsible for the given key

    There is already a working implementation of a DHT, algorithm.Ring. It is an example of implementing the DHT and using the whole simulator. The algorithm is extremely simple: each node maintains a pointer to its successor in the virtual ring. If the node does not have the responsibility to the key, it will point to its successor for further lookup. The lookup complexity is O(N) in the virtual network. simulator.SimpleTest contains an example of using the Ring implementation and the simulator. The real get and put hash table interfaces are implemented by the class simulator.Node. It will call the underlying lookup protocol to find the correct node and get/put key, value pairs. Please read the doc and the source code for details.

    An elegant Chord505 basic implementation do not have to modify the existing source code. Subclassing algorithm.Protocol and simulator.TestCase should suffice.

    Complete references

    Errata

    Submission Guideline

    The project should be submitted electronically. We use the "turnin" command on Eniac for project 2. Only one member of a team need to submit. Please do NOT submit multiple copies.

    Checkpoint submission:
    turnin -c cis505 -p checkpoint chord505.tar.gz

    Final submission:
    turnin -c cis505 -p final chord505.tar.gz

    Note that you have to run this command on Eniac (red.seas or blue.seas). Type turnin -h for more information.

    caveat on resubmission: Re-run "turnin" will completely overwrite your last submission, including files and timestamps. For example, if you have submitted something before the checkpoint deadline, and resubmit something after the deadline with -p checkpoint rather than -p final, you will be considered as having a late checkpoint submission and be penalized for your lateness.

    How to compile and run

    See the README file in the latest tarball.

    Where do I find an IDE

    Using an IDE can dramatically shorten you develop and debugging time. For Java, the Eclipse IDE is highly recommended. It has versions for both Windows and Linux.

    Where do I find a Java compiler

    Some of you have had difficulty finding the Java compiler on the system of your choice. The following table lists the locations of appropriate Java compilers on various SEAS systems:
    Host/typeLocation
    eniac, seas, red, bluejavac in /pkg/j/j2sdk1.4.0/bin
    gradientjavac in /pkg/j/j2sdk1.4.1_01/bin
    halfdome, m100d-lin?,
    and other SEAS Linux systems
    javac in /usr/java/j2sdk1.4.1_01/bin or
    jikes in /home8/c/cis505/local-linux/bin