591 Assignment 8: Token Counter Clarifications
Fall 2004, David Matuszek
It seems that any interesting project has "edge cases"--cases that
are on some boundary, and that are easy to get wrong. These are seldom obvious
when you start the project, but become apparent as you get into it.
is no exception.
myArray[index], there are two names and two punctuation marks. Technically, the brackets form a single indexing operator, but that's not for the tokenizer to decide. The closing bracket occurs some indefinite time after the opening bracket, making this a much harder problem, not suitable for a state machine. It isn't necessary for a tokenizer to be this complicated.
+=, can and should be counted as a single token. However, operators that are separated, such as the indexing operator
myArray[index], or the ternary operator
must be counted as two separate tokens.
max = x > y ? x : y;
The Java Tutorial, Third Edition says that
null are reserved words, but are not keywords.
Count them as keywords.
instanceof are operators. Count
them as keywords.
If we count
, there are 51 keywords in Java
1.4. Java 1.5 adds the keyword
enum, for a total of 52.
You should add
enum to your list of keywords, if you don't already
() are operators; brackets are
an indexing operator, and parentheses are both a method call operator and a
casting operator. Count each bracket and each parenthesis as a separate punctuation
The colon (
:) appears in at least three places in Java, once as
part of the ternary operator
?: and twice as simple punction. Don't
try to sort this out. Count every colon as a punctuation mark and the question
?) as an operator.
Numbers never begin with
- (although these can
occur within real literals).
Numeric literals containing an exponent are always real. Octal and hexadecimal numbers are always integer.
If I notice or hear about any more special cases, I will post something.