SAX Assignment
Fall 2002, David Matuszek

SAX parses a document sequentially, doing a callback for each thing that it encounters in the XML file. Because this is very fast and takes a minimum amount of memory, it is ideal for some tasks.

For example, if you have a large XML file containing item numbers and their prices, and someone places an order, it is very easy to go through the XML document once to find the price of each item, then print out an invoice.

It's a bit more difficult to come up with a meaningful SAX assignment that uses your existing XML files when they are all different. I needed an assignment that will get you started on using SAX, but is essentially independent of the subject matter of your XML files. Hence, the following assignment. Your program should work on any XML file, including XHTML and XSL files.

Your assignment:

Write a Java program that uses SAX to read in an XML file, finds all tags, attributes, and values, and displays a list of all that occur, along with how many times each tag occurred. For example, if the HTML file contains both <applet height="300" width="500" code="foo.class"></applet> and <applet height="500" width="500" code="bar.class"></applet>, your program should print something like:

applet (2 occurrences)
     code: foo.class, bar.class
     height: 300, 500
     width: 500

The tags should be listed in alphabetical order, and should not be repeated. That is, if a tag occurs multiple times, that fact should be indicated by a count (as in the above example). The attributes and values may be in any arbitrary order. Within any given tag, do not repeat the names of attributes (but you don't have to count occurrences); within any given tag-attribute pair, do not repeat values (but again, don't bother to count them). Hint: This part should be fairly easy if you remember your Collections from CIT594.

Use a FileDialog or a JFileChooser to choose the input file; display the results in a scrollable text area on the screen.

It should be possible to use your program on a second file (and a third, and a fourth, ...) without quitting and restarting.

The user interface should be kept very simple: maybe one button to load in a file and display results, and a second button to quit the program. If you add extra features to the program, be sure that the user interface remains totally obvious.

Display your name in the GUI.


Due date:

Thursday, October 31. Since this is a programming assignment, please turn in a zip or jar file via Blackboard; do not post it on your website for security reasons.