XML/DTD Assignment
Fall 2002, David Matuszek

Purposes of this assignment:

From HTML to XML:

HTML is designed to display information to humans; XML is designed to make information available to computers. Humans can deal with wide-ranging, unstructured information; computer programs are typically designed to process well-structured information in a narrowly defined domain.

If you have taken seriously the instructions in the previous assignment, you should have created a Web page that had a reasonable amount of information in it. It should be possible to organize and categorize at least some of that information. Almost certainly, not all of it will fit into a predefined mold.

Your assignment is to turn your HTML page, suitable for a human, into a highly structured and organized XML page, complete with an internal DTD, suitable for a computer. The XML should capture all the information that you feel is "essential" to the topic.

Here are some suggestions on how to proceed:

If your page is all about a category of things, say, "Amazon tree frogs" or "Great storms of the 20th century," each has a lot of distinguishing characteristics, and there is a lot you can say. If your page is all about a single thing, say, hurricane Andrew (1992) or Tiger-leg leaf frogs, you will have a harder time finding sufficient information; one way to proceed is to pretend your page is one of hundreds of other pages on similar topics, and think about designing a common form to fill in that would be suitable for all these pages.

Go back and forth between the XML and the DTD. For example, you might start by putting a little information into XML, then writing a DTD to describe it, then improving the DTD, adjusting the XML to fit the new DTD, and so on. Work with small amounts of information, so that you don't mind making changes to it. The final result will be much more sensible than if you write all of one part, then force the other to conform to it.

Download and use jEdit, with the XML plugin. This gives you syntax coloring and error checking when you save the file. If you have an internal DTD, it gives you popup menus with your tags, checkbox dialogs for your attributes, and validates your XML when you save. I think it's a tremendous help. Unfortunately, the XML plugin has not been installed in the labs here (I'm working on getting them to fix that.)

Brevity was explicitly not a design goal of XML. When you have worked out a logical structure for your XML, you will probably find that each tiny bit of information takes several lines to express in XML. You're going to get bored writing this stuff. Write a few pages anyway, at least enough to exercise all the features of your DTD, and to be sure that the DTD is suitable for any additional data you might add. You do not have to create an encyclopedia for this one assignment; something approximately the size of my example will be fine (view with Internet Explorer; Netscape and Opera just show tag content).

Remember that you are free to change topics, if your current topic turns out to be unsuitable. In fact, if you do feel that another topic would be a better choice, now is the best time to switch, before you put a lot more effort into the old topic. Trying to use XML should give you a much better idea of what kind of topic will work best in the future; but in any case, it should be something that you personally are interested in, because this course will be a lot more enjoyable that way. (If you do change topics from the XHTML assignment, it would be nice to drop me a note telling me about your new topic. Even better (but not required) would be a new HTML page about it.)

"Turn in" your XML (with embedded DTD) by putting a prominent link to it on your Web page. (You can move it to a less conspicuous place later, but your Web site for this course should accumulate all your assignments, so that at the end there is easy access to all your coursework.)

About Home pages: There is no technical difference between a "home page" and any other page. A home page is just one you design as a starting point for a bunch of other pages; for example, the CIT597 home page is just a starting point for CIT597 students. Your home page for this course need not be the same as your "public" home page. If you already have a Web site and you don't want all these assignments linked to it, just create a new "home page" for the course and give me that URL.

Due Date: Thursday, October 3 (midnight of that day).