CIT 597 Assignment 3: Text Extractor
Fall 2007, David Matuszek

Purpose of this assignment:

General idea of the assignment:

Write a Java application to read in an HTML page, remove all the HTML tags from it, and display the result in a large scrollable JTextArea.


Use a Swing GUI dialog box to enter a URL into your program. I recommend a JOptionPane as the quickest and easiest way to get the URL (see for a tutorial). Note that you need to type a complete URL (typically starting with http://) into the dialog box.

Go to that URL and get the HTML file that is there; read it as text. See the example at to get the necessary combination of I/O calls. (Note: A 403 return code means that they don't allow you to connect this way-- is an example--just find a different site.)

Most tags should just be removed, but with the following exceptions:

Use regular expressions to do most of the work of finding and removing or replacing tags.


We will test your program with both simple tags and with tags containing extra whitespace, tags that are split across lines, etc.


Due date:

Please turn in your program via Blackboard before midnight, Wednesday October 10 Friday October 12.