Regular Expression Assignment
Fall 2002, David Matuszek

Purposes of this assignment:

About HTML:

Most Web pages are written in a language called HTML (HyperText Markup Language). HTML uses tags (including both start tags and end tags), and some start tags have attributes.

A start tag is enclosed in angle brackets, like this: <tag>
and may have attributes, like this: <tag attribute1=value1 attribute2 = "value2"'>

Notes:

An end tag starts with a slash, </tag> or </tag >, and has the same identifier as the corresponding start tag, for example, <body>...</body>. End tags never have attributes.

Your assignment:

Write a Java program that reads in an HTML file, finds all tags, attributes, and values, and displays a list of all that occur, along with how many times each tag occurred. For example, if the HTML file contains both <applet height=300 width=500 code="foo.class"></applet> and <applet height=500 width=500 code="bar.class"></applet>, your program should print something like:

applet (2 occurrences)
     code: foo.class, bar.class
     height: 300, 500
     width: 500

The tags should be listed in alphabetical order, and should not be repeated. That is, if a tag occurs multiple times, that fact should be indicated by a count (as in the above example). The attributes and values may be in any arbitrary order. Within any given tag, do not repeat the names of attributes (but you don't have to count occurrences); within any given tag-attribute pair, do not repeat values (but again, don't bother to count them). Hint: This part should be fairly easy if you remember your Collections from CIT594.

Use a FileDialog or a JFileChooser to choose the input file; display the results in a scrollable text area on the screen.

It should be possible to use your program on a second file (and a third, and a fourth, ...) without quitting and restarting.

The user interface should be kept very simple: maybe one button to load in a file and display results, and a second button to quit the program. If you add extra features to the program, be sure that the user interface remains totally obvious.

Display your name in the GUI.

Due date: Thursday, September 19.