Homework assignments for CIS 455 / 555


We recommend the use of Git, a version control system, for maintaining your project code; if you are not familiar with Git, please have a look at the documentation. Other tools used include Vagrant/Virtualbox, Eclipse, maven, and bitbucket. We will familiarize ourselves with these tools throughout the semester.

HW 0

Setting up Linux and Eclipse
This assignment will help you establish the infrastructure you'll need for the remaining homeworks.

HW 1

Web and application server
Some useful URLs:
  • SparkJava's web site provides sample usage which mirrors what we are after.
  • You may also want to have a look at our Basic Testing Guide for some helpful tips.

HW 2

Web crawler and XPath engine
For testing, we have set up a sandbox that you can safely crawl. For MS2, you will also need the StormLite handout.

HW 3

Storm and MapReduce
For this assignment, you will extend StormLite to a distributed framework, which essentially emulates Apache Storm – and, in the process, also emulates MapReduce.

Final Project

Distributed web crawler and search engine
In addition to the PDF, you may find the following useful: Getting started guide for Amazon EC2.