| CIT
591 Assignment 8: Token Counter Clarifications Fall 2004, David Matuszek |
It seems that any interesting project has "edge cases"--cases that
are on some boundary, and that are easy to get wrong. These are seldom obvious
when you start the project, but become apparent as you get into it. JavaTokenCounter
is no exception.
Guiding Principles
myArray[index], there are two names and
two punctuation marks. Technically, the brackets form a single indexing operator,
but that's not for the tokenizer to decide. The closing bracket occurs some
indefinite time after the opening bracket, making this a much harder problem,
not suitable for a state machine. It isn't necessary for a tokenizer to be
this complicated. +=,
can and should be counted as a single token. However, operators that are separated,
such as the indexing operator [] in myArray[index],
or the ternary operator ?: in max =
x > y ? x : y; must be counted as two separate tokens.Specific Cases
The Java Tutorial, Third Edition says that true,
false, and null are reserved words, but are not keywords.
Count them as keywords.
Technically, new and instanceof are operators. Count
them as keywords.
If we count true, false, null, new,
and instanceof, there are 51 keywords in Java
1.4. Java 1.5 adds the keyword enum, for a total of 52.
You should add enum to your list of keywords, if you don't already
have it.
Technically, [] and () are operators; brackets are
an indexing operator, and parentheses are both a method call operator and a
casting operator. Count each bracket and each parenthesis as a separate punctuation
mark.
The colon (:) appears in at least three places in Java, once as
part of the ternary operator ?: and twice as simple punction. Don't
try to sort this out. Count every colon as a punctuation mark and the question
mark (?) as an operator.
Numbers never begin with + or - (although these can
occur within real literals).
Numeric literals containing an exponent are always real. Octal and hexadecimal numbers are always integer.
If I notice or hear about any more special cases, I will post something.