CIT 594 Assignment 5: Tokenizer
Spring 2014, David Matuszek

General Idea

Write a program that, given a, reads a stream from it and returns a series of tokens.

A token is one of: A word (a name or a keyword), a number, a punctuation mark, an end-of-line, or an end-of-input.

Write your program as a state machine. Provide complete JUnit tests for your program, and document it fully with javadoc comments.


Write the following classes:

public enum TokenType
This class is completely defined as:
public enum TokenType { NAME, KEYWORD, NUMBER, SYMBOL, EOL, EOI, ERROR }

public class Token

A Token has a type and a value. The value is a String containing the exact characters that make up the Token. The type is a TokenType that tells what kind of thing the Token represents--a variable, keyword, number, or symbol (punctuation mark). End-of-lines and the (one) end of input are also returned as tokens. These have the following meanings:

The token class should also include the following constructors/methods:

public class Tokenizer

The Tokenizer will have at least the following constructors/methods:

public class TokenTest

A JUnit test class for Token. Pretty simple.

public class TokenizerTest

A JUnit test class for your Tokenizer.


Before you begin programming, read over then assignment, then estimate how long it will take you. Turn in a readme.txt file containing both your estimate and the time it actually took you. Also tell how much you used test-driven development (entirely/mostly/somewhat/a little/not at all).

This extra information is required (you will lose points if you don't provide it), but the actual numbers, and what you say about TDD, will not affect your grade.

Due date

Turn your assignment in to Canvas before 6 a.m. Thursday, February 27. Late programs, even if only a minute late, will be penalized 10 points for the first week. Programs later than a week may or may not be accepted for grading.