Homework assignments for CIS 455 / 555

Resources For most assignments, we will provide a virtual machine image that contains all the necessary tools. To use this image, you will need VirtualBox (free), VMware Workstation Player (free for personal use) or VMware Fusion (not free).

Development will be in Java. We recommend the use of Git, a version control system, for maintaining your project code; if you are not familiar with Git, please have a look at the documentation. As a development environment, you may want to use Eclipse, possibly in combination with the EGit plug-in.

Assignment 0 Using the Virtual Machine Image (pdf)

This very simple assignment will show you how to use the virtual machine image we have prepared for you. You also need to download the VM image.

Assignment 1 Web and application server (pdf)

Some useful URLs:

Assignment 2 Web crawler and XPath engine (pdf)

For testing, we have set up a sandbox that you can safely crawl. For MS1, you may want to have a look at the BDB handout; for MS2, the StormLite handout may be useful.

Assignment 3 Storm and MapReduce (pdf)

For this assignment, you will extend StormLite to a distributed framework, which essentially emulates Apache Storm – and, in the process, also emulates MapReduce. We have also prepared an additional handout that includes some additional information about StormLite and the MapReduce workflow.

Final Project Distributed web crawler and search engine (pdf)

In addition to the PDF, you may find the following useful: