CIT 597 Assignment 7: Perl
Fall 2006, David Matuszek

Purposes of this assignment:

General idea of the assignment:

Translate an HTML page into XHTML.


Name your program xhtmlize.

Your program should accept any number of command-line arguments. It should check that each argument is the name of a file with a .htm or .html extension, and that the named file exists. It should also check that a file with the same name but an .xhtml extension does not already exist. For each acceptable file, the program should read the input file and silently produce the corresponding .xhtml file. If an input file cannot be processed for any reason (including errors during translation), the program should produce an appropriate error message for that file, but it should continue to process the remaining input files.

The output .xhtml file should begin with this DOCTYPE declaration:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"

The remainder of the file should be enclosed in <html>...</html> tags.

Tags should be properly nested. To achieve this, you will need to keep a stack of start tags.

All tags must be properly closed. There are a few tags you should recognize and close immediately: <br ...> becomes <br ... />, <hr ...> becomes <hr ... />, and <img ...> becomes <img ... />.

All tag names and all attribute names must be in lowercase.

Attribute values must be quoted.

Any start tag with a name attribute should have the id attribute added to it, with the same value.

In XHTML, attribute minimization is forbidden. However, you can ignore this requirement; I don't think I've ever seen attribute minimization in use.

Use regular expressions to do most of the work. Use functions to break up the work into manageable chunks.

Due date:

Monday, November 27 , before midnight. Submit your program via Blackboard..