| CIT
591 Assignment 8: Token Counter Fall 2004, David Matuszek |
Purposes of this assignment:
General idea of the assignment:
Categorize and count the tokens in an input Java file.
Details:
Here are the kinds of "tokens" that can appear in a Java program:
String and
println, and user-defined names. Do not try to distinguish
between user-defined names and the thousands of Sun-defined names!'x', because you can have
escaped characters (such as '\n') and characters represented
by their numeric codes (such as '\015'). Be sure to recognize
all valid character literals.float literals as well
as double literals.=,
+, !=, +=, <, <<,
<<<). Notice that each of these counts as one operator,
not one per character. For purposes of this assignment, we will not
count the colon (:) as an operator. /* and end with */. /** and end with */. // and extend to the end
of the line.This isn't the only possible set of token categories, but it will do.
There are some important methods in the Character class that you
should look at and use.
The primary purpose of this program is to count tokens in files and display the counts in a GUI, but if it combines all those operations, JUnit testing is very difficult. So we'll use the MVC (Model-View-Controller) design pattern to divorce the actual computations from the input and output operations, as follows:
JavaTokenCounter, will get characters from a "data
source," and provide methods to retrieve (and clear) the token counts.
It won't do any input/output.StringDataSource,
or a file (for doing the real work), FileDataSource. So that
these can be used interchangeably, both will implement an interface DataSource.JavaTokenCounter will have a main method that
(with the help of additional methods) sets up the GUI, chooses files, and
displays token counts.. The main method will call a method public void
countTokens(DataSource ds) to do the counting, and will have various
getter methods to retrieve the results.JavaTokenCounterTest will do JUnit testing on JavaTokenCounter,
using countTokens with a StringDataSource parameter.Classes and interfaces
Please be really sure to use the exact same names for everything as listed here, because we will be doing our own unit testing of your code.
interface DataSource
I'll supply the code for this interface. Here it is:
public interface DataSource { char read(); void unread(char ch); }You get to supply the Javadoc comments.
Your implementations of
read()should return a null character,'\0', when there are no more characters to be returned. (You can't return a-1.)The
unread(char ch)method is useful because you often discover you are at the end of a token by getting a character that belongs to the next token. For example, incount++, you know you are finished getting the namecountwhen you get the'+'; but you need to put that plus back so you can use it in getting the next token.
class StringDataSource implements DataSource
This has a constructor that takes a
Stringas an argument.
class FileDataSource implements DataSource
This has a no-argument constructor that sets up a file for use as a data source. The constructor should ask the user for a file by calling
JFileChooser.Once you have a file, see my SwingExamples.zip (unchanged from last time) to see how to get a
BufferedReaderfor it; then use this as a parameter to the constructor forPushbackReader. See the Java API for information on PushbackReader--notice that it works withints, notchars, so you will need to do a little casting in your class.
public class JavaTokenCounter
This class should have (at least) the following methods:
public void countTokens(DataSource ds)- Counts all the various types of tokens in the DataSource, and stores them in instance variables. If called repeatedly, the new counts should be added to, not replace, the previous counts.
public void clearAllCounts()- Sets all token counts to zero.
public int getNameCount(),getKeywordCount(),getCharacterLiteralCount(),getIntegerLiteralCount(),getRealLiteralCount(),getStringLiteralCount(),getOperatorCount(),getPunctuationCount(),getCStyleCommentCount(),getJavadocCommentCount(), andgetEndOfLineCommentCount()
All of these should bepublic int. Don't count the EOF, and don't supply methods to count various combinations of totals--the calling program can combine totals if it wants to.
public static void main(String[] args)- When used as an application, the program should ask for a file as a data source, count the tokens in it, and display the following results in a GUI:
- Each of the above token types (not including EOF),
- A comment count (combining the three comment types),
- A count of non-comments (combining the other eight types, not including EOF), and
- A total token count (not including EOF).
After reading each file, the user should be able to:
- Choose another file, count tokens, and add them to the existing counts,
- Clear the counts, or
- Quit the program.
public class JavaTokenCounterTest extends TestCase
Provides JUnit tests for the
JavaTokenCounterclass. UseStringDataSource, notFileDataSource, in your testing (because unit testing should be completely automatic). It's a good idea to write this class first.
You should assume that the file you are reading contains a correct Java program. Your program shouldn't crash on any input, but if it's not correct Java, don't worry about how to count things in it.
How to do it:
Use state machines.
You could do this with a single (very large) state machine, but it's more convenient
to break it up into multiple state machines. For example, I have a method parseNumber()
which is called when I first encounter a digit; it implements a state machine
with states INTEGER_PART, FRACTION_PART, EXPONENT_PART,
and EXPONENT_DIGITS_PART, to keep track of which part of the number
I'm in. (It doesn't yet deal with octal or hexadecimal numbers.) I'd also recommend
state space machines for operators and possibly for character literals, string
literals, and comments.
If you use state machines properly, and choose your state names carefully,
this assignment turns out to be easier than you would expect. To give you an
example of what I mean, here's what my FractionCalculator program
does when a digit is clicked:
public String clickDigit(char digit) {
switch (state) {
case STARTING_NUMERATOR:
setDisplay(digit);
state = GETTING_NUMERATOR;
break;
case GETTING_NUMERATOR:
appendToDisplay(digit);
break;
case STARTING_DENOMINATOR:
state = GETTING_DENOMINATOR;
appendToDisplay(digit);
break;
case GETTING_DENOMINATOR:
appendToDisplay(digit);
break;
case COMPLETED_FRACTION:
haveFirstNumber = true;
setDisplay(digit);
state = GETTING_NUMERATOR;
break;
case ERROR:
break;
default:
assert false;
}
return displayString;
}
|
(This has no relevance to the current assignment, other than as a reminder of what state machines are like.)
For this assignment, you may not use the classes StringTokenizer,
StreamTokenizer, or Pattern.
Grading:
You should work with a partner and, as before, you both will get the same grade
on the project. Ideally each of you should write at least one method for the
JavaTokenCounter class that uses a state machine.
Due date:
Wednesday, December 1, before midnight.