CIT 594 Assignment 4:
Tokenizer
Spring 2009, David Matuszek
Write a program that, given a java.io.Reader, reads a stream
from it and returns a series of tokens.
StringReader will allow you to read Strings; this is
good for testing purposes.FileReader will allow you to read from files; this is
how your Tokenizer is most useful.A token is one of: A word (a name or a keyword), a number, a punctuation mark, an end-of-line, or an end-of-input.
Write your program using Eclipse. Provide complete JUnit tests for your program, and document it fully with javadoc comments.
Write the following classes:
TokenTypeenum TokenType { NAME, KEYWORD, NUMBER, SYMBOL, EOL, EOI, ERROR
}TokenA Token has a type and a value. The value
is a String containing the exact characters that make up the
Token. The type is a TokenType that tells what kind
of thing the Token represents--a variable, keyword, number,
or symbol (punctuation mark). End-of-lines and the (one) end of input are
also returned as tokens. These have the following meanings:
NAME -- begins with a letter or
an underscore, consists of letters,
digits, and underscoresKEYWORD -- same as a NAME, but in addition
is a value that occurs in a keywords array
if, else,
while, do, for,
switch, case, maybe a few othersNUMBER -- one or more digits SYMBOL -- any single punctuation mark (not including
whitespace)EOL -- an end of line character (value should be
"\n")EOI -- the end of input (value can be an empty string,
"")ERROR -- an error (for now, trying to get another
token after getting an EOI)The token class should also include the following constructors/methods:
public Token(TokenType type, String value) --
constructorpublic String getValue() -- a getter method for the
token's valuepublic TokenType getType() -- a getter method for the
token's typepublic boolean equals(Object o) -- a test for equality
of tokensTokenizerThe Tokenizer will have at least the following constructors/methods:
public Tokenizer(Reader reader) -- constructor (sets
the String or File to be tokenized)public boolean hasNext() -- returns true if there are
more tokens to be returned.public Token next() -- returns the next token from the
string.public void backUp() -- "backs up" one token, so that
whatever was returned from the most recent call to next()
will be returned again the next time next() is called;
only one token is remembered, so if you call backUp()
multiple times, the second and subsequent calls don't make any
differenceThe Tokenizer should ignore (skip over) all types of comments.
TokenTestA JUnit test class for Token. Pretty simple.
TokenizerTestA JUnit test class for your Tokenizer. Be very careful,
and be aware that some characters may need to be marked "ordinary."
Write your Tokenizer class as a façade for the
StreamTokenizer class. That is, your constructor for
Tokenizer will create an instance of
StreamTokenizer that will subsequently be used by the
Tokenizer methods.
You don't need a main method. All your testing can be done
via JUnit.
The Token and Tokenizer classes should be quite
simple. You will probably find the TokenizerTest class much more
challenging than the Tokenizer class itself. There are two
significant advantages to having the JUnit tests, though:
Token and Tokenizer classes, and the JUnit tests
will be very helpful in making those modifications.Tuesday, February 16, before midnight (zipped and submitted via Blackboard).