CIT 595 Spring 2011

Homework #5

Due Sat Apr 2 12:00 noon


Introduction

In this assignment, you will develop a basic web server in C++. Your implementation will integrate concepts we have seen throughout this course and in CIT 593: caching, threading, synchronization, networking, file I/O, etc. Note that this assignment is considerably more complex than others you have done so far, but you will work in pairs and have two weeks to complete this homework. Additionally, we will dedicate two lab sessions to this assignment so that you can get help from the instruction staff.

Note that, once you've done Part 1, the rest can be done in any order, independently of each other. Figure out which one seems easiest to you and work on that part first, then try to tackle the others.


Before you begin
Download HttpServer.h and HttpServer.cpp. This is a very basic implementation of a server that accepts an HTTP request and echoes back the name of the page that was requested. Make sure you are able to understand and compile this code.

When you run the program, you need to specify a port number on which the server should listen. This should be given as a command-line argument. Note, however, that you cannot use port numbers below 1024.

Check that you are able to test the server by connecting to the host:port using a web browser. For instance, if you are using your own computer and start the server on port 8080, you could point your web browser to http://localhost:8080/test.html and you should see the response "You requested /test.html".

To stop the server, just use Ctrl-C.

Note that if you are using eniac or lab machines, you may not be able to access your webserver remotely because of firewall issues. Additionally, you should use Firefox instead of Konqueror if using a server running on any of these machines. Please discuss this with a member of the instruction staff before proceeding.


Part 1: Returning a web page (25 points)

As you probably know, a web server works like this:

  1. It reads the incoming HTTP request and figures out the page that's being requested
  2. It figures out the full path to the file on the local file system
  3. It reads the file byte-by-byte and sends the bytes back to the client
Modify the skeleton code so that it performs these actions (note that step 1 has already been done for you). That is, when a browser requests /test.html, your program should figure out which file that is on the local file system, read it, and send back the bytes.

You do not have to handle any errors at this time (for instance, if the page does not exist). Nor do you have to handle anything other than basic HTML files (for instance, you don't have to deal with images or anything like that).

Note that you will have to make changes to various parts of the program (including, conceivably, the interface/definition of the HttpServer class). Consider what data you will need and how it should be stored.

Create a few simple HTML pages (with links to each other) and test that your server works correctly. Make sure this part is done before proceeding to the rest of the assignment!

 

Part 2: Error handling (10 points)

Modify your code from Part 1 so that an error message is sent back to the client's web browser if the requested page does not exist. For instance, if the browser requests http://localhost:8080/foobar.html and there is no "/foobar.html" page on the local file system, then the server should send back a meaningful message indicating that the page does not exist (this is known as a "404" error).

Additionally, send back an error page if any other unexpected error occurs, such as being unable to allocate memory, missing some necessary data, etc. These are known as "500" errors.

 

Part 3: Statistics (15 points)

Modify your program so that it keeps track of the following statistics:

Further modify your program so that when the browser requests the page "/stats", these statistics are sent back to the browser as HTML (oooh, dynamic content!).

Think carefully about where these values should be stored, using good object-oriented design techniques.

 

Part 4: Threading (25 points)

Modify your program so that each page request is handled in its own thread. That is, after the server accepts a connection, it starts a new thread and invokes the function that handles the request. The maximum number of simultaneous handler threads should be configurable in the program (default 5); if all threads are being used, subsequent requests will have to wait.

Threading in C++ is more or less the same as in C, and we provide you an example that you can use, but there are a few important things to note:

Before you start coding, think about how you will need to organize your data, whether you should create any new classes, and how to make the rest of the program "threadsafe" (i.e., avoid race conditions).

 

Part 5: Caching (25 points)

In Part 1, your code opens a file, reads its bytes, and then sends them to the browser. If the same page is requested again, you probably reopen the file, read it again, and send them back. Clearly, there's no need to have to reopen and reread the file if it hasn't changed; thus, caching the files in memory would likely speed things up.

Implement a cache that stores the bytes of recently-read files in memory. For simplicity, let's say that the cache only holds a fixed number of files (default 5) and uses a FIFO replacement strategy, i.e. if the cache is full and a webpage is requested that is not in the cache, the oldest webpage is evicted.

Keep track of the number of hits and misses in your cache, and modify your solution to Part 3 so that it displays those statistics as well.

Again, before you start coding, think about how to organize your data and what classes you will need to create. Be careful about race conditions and other synchronization issues.

 

Extra Credit! Images (10 points)

Modify your web server so that it handles requests for JPEG images. You will need to investigate how to read binary files from the file system (here's a hint). Be sure to modify other parts of your application, such as your solution to Parts 3 and 5, accordingly.


Academic Honesty

Although you are allowed to work with your partner on this assignment, you should not discuss solutions with other students. Additionally, if you look for help online and find example code, you must cite it if you are going to use it.

If you need help with this assignment, please visit a member of the teaching staff during office hours, or contact one of us to set up an appointment.


Submission
For this assignment:
  1. One member of your group should put your source code files (.cpp and .h) in a directory called "homework5_[your-SEAS-id]". For instance, mine would be homework5_cdmurphy.
  2. Additionally, include a Makefile that compiles all your programs and put that in the same directory, too.
  3. Last, from the directory ABOVE your "homework5_[your-SEAS-id]" directory (ie, its parent), tar and/or zip the directory with all your files in it. For instance, on eniac you could do "tar -cvf homework5_[your-SEAS-id].tar ./homework5_[your-SEAS-id]".
  4. Submit the tar/zip file in Blackboard.

Failure to properly follow the submission instructions could result in a delay of grading your assignment and/or a lateness penalty, so please be sure to do it correctly!

Homeworks are to be submitted via Blackboard, as described on the course overview page. Please be sure to tar and/or zip your files into a single submission file!