CIT 594 Assignment 1: Tokenizer
Spring 2006, David Matuszek |
Purposes:
- To ensure you are familiar with Eclipse
- To ensure you are familiar with JUnit testing
- To ensure you are familiar with
Iterator
- To familiarize you with
enum
- To familiarize you with state-space machines
- To familiarize you with stacks
General Idea:
Write an API that, given a String, breaks it into a series
of tokens. For our purposes, a token is one of: A word, a number, a punctuation mark, or an
end-of-line
Write your program using Eclipse, and document it fully with javadoc comments.
Details:
Write the following classes:
public enum Type
This enumeration should have the following values: NAME,
NUMBER, SYMBOL, and EOL. (Hint:
This is a very small class.)
|
Token
A Token has public fields Type type
and String value. The value holds
the exact characters that make up the Token. The type
tells what kind of thing the Token represents.
NAME
- A name begins with a letter, and may contain any number of letters,
digits, and/or underscores.
NUMBER
- A number consists of one or more digits. A sign (+ or -), if present,
is not part of a number.
SYMBOL
- Any character that isn't whitespace and isn't part of a name or
a number.
EOL
- Denotes the "end of line." For this type, the
value
should be the empty string.
Note: To correctly recognize letters, digits, and
whitespace, use the methods in java.lang.Character.
The token class should also include the following constructor and methods:
public Token(Type type, String value) -- constructor
@Override public boolean equals(Object o) -- tests
if two Tokens are equal
@Override public String toString() -- Returns a String
representation of this Token.
Note: @Override tells the Java 5 compiler
that you are trying to override an inherited method, so that it can
warn you if you get the signature wrong.
|
public Tokenizer implements Iterator
The Tokenizer will have at least the following constructor and methods:
public Tokenizer(String input) -- constructor (sets
the string to be tokenized).
public boolean hasNext() -- returns true if
there are more tokens to be returned.
public Token next() -- returns the next token
from the string.
public void remove() -- throws an UnsupportedOperationException.
public void putBack(int howMany) -- steps back past
the previous howMany tokens (so they can be returned
again by subsequent calls to next).
|
Comments:
You don't need a main method--all your testing can be done via
JUnit. Here are my JUnit tests, and you may
add more if you like; if you add any public methods to those required above,
you must provide JUnit tests for them.
Additional requirements:
- Implement your
Tokenizer.next() method with a state machine.
- Define your states as an inner
enum class within your Tokenizer
class.
- Do not use
StringTokenizer or regular expressions
anywhere in this assignment.
- As you find tokens, put them on a
java.util.Stack, and when
you need to putBack tokens, get them from this stack.
- Write any additional code needed to pass all the JUnit tests that I gave
you with this assignment.
Due date:
Tuesday, January 17, before midnight (zipped and submitted via Blackboard).