CIT 594 Assignment 4: Tokenizer
Spring 2009, David Matuszek

Purposes:

General Idea:

Write a program that, given a java.io.Reader, reads a stream from it and returns a series of tokens.

A token is one of: A word (a name or a keyword), a number, a punctuation mark, an end-of-line, or an end-of-input.

Write your program using Eclipse. Provide complete JUnit tests for your program, and document it fully with javadoc comments.

Details:

Write the following classes:

TokenType
This class is defined as:
enum TokenType { NAME, KEYWORD, NUMBER, SYMBOL, EOL, EOI, ERROR }
Token

A Token has a type and a value. The value is a String containing the exact characters that make up the Token. The type is a TokenType that tells what kind of thing the Token represents--a variable, keyword, number, or symbol (punctuation mark). End-of-lines and the (one) end of input are also returned as tokens. These have the following meanings:

The token class should also include the following constructors/methods:

Tokenizer

The Tokenizer will have at least the following constructors/methods:

The Tokenizer should ignore (skip over) all types of comments.

TokenTest

A JUnit test class for Token. Pretty simple.

TokenizerTest

A JUnit test class for your Tokenizer. Be very careful, and be aware that some characters may need to be marked "ordinary."

Comments:

Write your Tokenizer class as a façade for the StreamTokenizer class. That is, your constructor for Tokenizer will create an instance of StreamTokenizer that will subsequently be used by the Tokenizer methods.

You don't need a main method. All your testing can be done via JUnit.

The Token and Tokenizer classes should be quite simple. You will probably find the TokenizerTest class much more challenging than the Tokenizer class itself. There are two significant advantages to having the JUnit tests, though:

Due date:

Tuesday, February 16, before midnight (zipped and submitted via Blackboard).