| CIT
597 Assignment 7: Perl
Fall 2006, David Matuszek |
Translate an HTML page into XHTML.
Name your program xhtmlize.
Your program should accept any number of command-line arguments. It should
check that each argument is the name of a file with a .htm or .html extension,
and that the named file exists. It should also check that a file with the same
name but an .xhtml extension does not already exist. For each
acceptable file, the program should read the input file and silently produce
the corresponding .xhtml file.
If an input file cannot be processed for any reason (including errors during
translation), the program should produce an appropriate error message for that
file, but it should continue to process the remaining input files.
The output .xhtml file should begin with this DOCTYPE declaration:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> |
The remainder of the file should be enclosed in <html>...</html> tags.
Tags should be properly nested. To achieve this, you will need to keep a stack of start tags.
All tags must be properly closed. There are a few tags you should recognize
and close immediately: becomes , becomes , and becomes .
All tag names and all attribute names must be in lowercase.
Attribute values must be quoted.
Any start tag with a name attribute should have the id attribute added to
it, with the same value.
In XHTML, attribute minimization is forbidden. However, you can ignore this requirement; I don't think I've ever seen attribute minimization in use.
Use regular expressions to do most of the work. Use functions to break up the work into manageable chunks.