CIT 594 Assignment 1: Tokenizer
Spring 2004, David Matuszek

Purposes:

General Idea:

Write a program that, given a String, breaks it into a series of tokens. A token is one of: A word, a number, a punctuation mark, an end-of-line, or an end-of-input.

Write your program using Eclipse. Provide complete JUnit tests for your program, and document it fully with javadoc comments.

Details:

Write the following classes:

Token

A Token has a type and a value. The value is a String containing the exact characters that make up the Token. The type is an int that tells what kind of thing the Token represents--a name, a number, a symbol (punctuation mark). End-of-lines and the (one) end of input are also returned as tokens.

Your token class should include the following public static final ints:

  • NAME -- begins with a letter, consists of letters, digits, and underscores
  • NUMBER -- one or more digits (only)
  • SYMBOL -- any single punctuation mark (not including whitespace)
  • EOL -- an end of line character
  • EOI -- the end of input
  • ERROR -- an error (for now, trying to get another token after getting an EOI)

The token class should also include the following constructors/methods:

  • public Token(String value, int type) -- constructor
  • public String getValue() -- a getter method for the token's value
  • public int getType() -- a getter method for the token's type
  • public boolean equals(Object o) -- a test for equality of tokens
Tokenizer

The Tokenizer will have at least the following constructors/methods:

  • public Tokenizer(String input) -- constructor (sets the string to be tokenized)
  • public boolean hasNext() -- returns true if there are more tokens to be returned.
  • public Token next() -- returns the next token from the string.

TokenTest

A JUnit test class for Token. You will be graded on completeness.

TokenizerTest

A JUnit test class for Token. You will be graded on completeness

Comments:

Probably the simplest way to write this program is to create an instance of StringTokenizer inside your Tokenizer class, and adapt its results to the requirements of this assignment.

You don't need a main method. All your testing can be done via JUnit.

The Token and Tokenizer classes should be quite simple. The difficult parts of this assignment are (1) getting used to Eclipse, and (2) learning to construct JUnit tests. You will probably find the TokenizerTest class much more challenging than the Tokenizer class itself. There are two significant advantages to having the JUnit tests, though:

Due date:

Tuesday, January 19, before midnight (zipped and submitted via Blackboard).