CIT 597 Assignment 3: Working with XML
Fall 2008, David Matuszek

Purposes of this assignment:

General idea of the assignment:

The home page for last year's version of CIT597, http://www.cis.upenn.edu/~matuszek/cit597-2007/index.html, looks a great deal like the home page for this year's course--but complete, rather than still under construction. Your assignment is to convert the data on this page into XML, along with an appropriate DTD and CSS.

You should include all the data from:

Do not include:

You should be working with the source code (easily downloaded from the web), not just the visible page. The URLs are an important part of the (invisible) data you need to capture. However, please do not include the actual email addresses--just use matuszek and sachisac as the complete address. Putting email addresses on a web page is an open door for spammers.

Write a DTD that correctly describes your XML. Use it to validate your XML.

Write CSS to allow browsers to display the XML in a format that is easily legible to human readers (and reasonably attractive). The various types of information--dates, links to PowerPoint presentations, links to readings, etc.--should be visually distinct.

Put the XML, the DTD, and the CSS in three separate files.

Details:

XML

First, you need to figure out how to represent this data in XML. I have used icons (described in the Key at the bottom of the page) to categorize most of the data in the main table, so that should be your starting point. Where there are URLs, you need to capture the URLs, not just my text labels for them. (Get all the actual data from this table.)

When you display the URLs in a browser, they do not have to be functional links--that is, it is fine if clicking on them does nothing. You should, however, be able to copy them and paste them into the URL field of your browser, and have them work.

I chose this page because it is a ready-made source of "real" data. It is reasonably well-organized, so it shouldn't give you too much trouble, but still has some real-world imperfections. Finally, the page belongs to me, so there are no legal issues to deal with.

Unless you have a good reason to use namespaces, avoid them.

Since there is quite a bit of data for you to translate to XML, you may find parts of this assignment repetitious and boring. Good tools, for example a good text editor with well-designed search-and-replace capabilities, can help a lot (especially if you know something about regular expressions). Think about how you can minimize your effort.

DTD

Although it is possible to write the XML first and the DTD afterward (or vice versa), it is usually better to develop them simultaneously. As you figure out what you want each part of the XML to look like, you can encode that in the DTD.

Do not allow mixed content. Mixed content is Eeeevil.

Validate your XML against the DTD. Most browsers do at least some validation; you can find other validators on the Web, such as Validome.

CSS

You don't have to recreate my HTML page, or anything like it. But it should look like a page designed for people to read, not just data dumped onto the screen. Use colors, fonts, layouts, whatever you think appropriate to make it look nice and be easy to read and understand.

Due date:

Tuesday, September 30, by midnight. As before, put the XML page and its associated files on the web, and send me the URL. Be sure to put "597" in the Subject: line!