| CIT
597 Link Extractor Addendum Fall 2004, David Matuszek |
java.awt.FileDialog
or javax.swing.JFileChooser).name or name=value
;<a> tags without an href attribute mean
something else, and should be ignored by your program; and<img> tags have a src attribute.<a> tag doesn't contain any link text?empty link text or similar.<a> tag encloses an image rather than link
text?<img> tag in the usual way, and use its URL
(only) as the link text. Thus, for <a href=X><img src=Y></a><a href=Y>Y</a> in the
list of images and <a href=X>X</a> in
the list of links.<E> in the Java API for java.util.ArrayList?String and, based on
some tests on that String, decide whether to keep or discard the URL. If this
were a "real" program, with a "real" API, then your filters
would accept everything, and your "real" users would override your
filters to accept only URLs they might be interested in. Your filter should
also get the attributes and values in some form (a HashMap is
ideal), in case your hypothetical users want to use some of that information
in making their filtering decisions. But since this is not a real API, just
an assignment, you should write example filters that accept some things and
reject others (tell us which). To make it easier for us to grade, I suggest
you reject .gif images and hyperlinks that contain a '#'
character./" (slash); a
relative pathname does not. For example, if page AAA/BBB/...XXX/index.html
contains <a href="foo/bar.html"> ,
that means that the directory that contains index.html also contains
a subdirectory foo, and foo contains bar.html.
This is a path relative to the location XXX,
which is not necessarily the same place as your output goes. Your output should
contain absolute paths. On most systems this would probably be file://AAA/BBB/...XXX/foo/bar.html,
but for some reason, on Windows we seem to need file://localhost/C:/AAA/BBB/...XXX/foo/bar.html.