CIT 594 Assignment 4: Tokenizer
Spring 2012, David Matuszek

Purposes:

General Idea:

Write a program that decomposes a String into a series of tokens.

A token is one "thing" in the input string; for instance, a word or a number. Since we will be writing a tokenizer for computer programs, we will also have tokens representing operators, and grouping symbols such as parentheses.

Write your program using Eclipse. Use regular expressions to recognize each token. Provide complete JUnit tests for your program, and document it fully with javadoc comments.

Details:

Write the following classes:

TokenType
The entire definition for this class is:
public enum TokenType {
    INT, FLOAT, NAME, OPERATOR, GROUPING_SYMBOL, STRING, COMMENT, ERROR;
}
Token

A Token has a type and a text. The type is one of the aboveTokenType values. The text is a String containing the exact characters that make up the Token. Types are as follows:

The Token class should have the following constructor:

the following instance variables:

and the following methods:

Tokenizer (implements Iterator)

The Tokenizer will have the following constructor and methods:

TokenTest

A JUnit test class for Token. Pretty simple.

TokenizerTest

A JUnit test class for your Tokenizer.

Comments:

You don't need a main method. All your testing can be done via JUnit.

Due date

Your program is due before 6am Tuesday, February 14. Zip up your entire Eclipse project, and submit via Blackboard. Only assignments submitted via Blackboard will be accepted--any assignments emailed to me will be discarded without comment. A late penalty of 5 points per day (out of 100 points) will apply.

Because many of you are interviewing this semester, a limited number of 48-hour extensions will be available. To get an extension, email me before 5pm Friday, stating the reason you need the extension. No extensions will be granted after Friday. If you get an extension and fail to get the project in by the extended due date, late points will be counted from the original due date.