| CIT
594 Creating
a Parser for Tree Expressions Spring 2008, David Matuszek |
If you have been working on the assignment, you will have noticed by now that
the parse method is over half the work of the assignment.
There are many possible ways to write this method; most of them result in
horrible code that is extremely difficult to debug. However, if you apply
the lessons you (should have) learned from the Recognizer
assignment, it is possible to write and debug the method in just a few
hours.
From the Building Trees assignment:
public static Tree<String> parse(String treeDescription)- Translates a String description of a tree into a
Tree<String>object. ThetreeDescriptionhas the formvalue(child child ... child), where avalueis any sequence of characters not containing parentheses or whitespace, and eachchildis either just a (String) value or is another treeDescription. Whitespace is optional except where needed to separate values.
For example, the Stringbecomes the Tree:"one (two three (four five(six seven eight)) )"
The input string can be described as follows:
<treeDescription> ::= <value> [ "(" { <treeDescription> } ")" ]
That is, a <treeDescription> is either a single <value>,
or a <value> followed by any number of <treeDescription>s
enclosed in parentheses. The <value> is any sequence of
characters not containing parentheses or whitespace (your tokenizer should
do this for you).
Using the techniques from the Recognizer assignment, you should be able to write a method to write a method to recognize tree descriptions. Here is some pseudocode to help you out:
Tree parse(input_string) { Create a tokenizer for the input_string; return parse2(tokenizer); } Tree parse2(tokenizer) { Get a value from the tokenizer if you can, or return null if you can't; Make a tree with this value in the root; If the next symbol is an open parenthesis { While parse2(tokenizer) returns a new tree { Add the tree as a child to the root; } Make sure the next symbol is a close parenthesis; } Return the root; }
A Recognizer method would return true or false. Since
we want to return a tree when we succeed, we'll use null in place of false.
The red lines above have nothing to do with recognizing the tree description; they are what you need to add to create the tree from the description.
This isn't all there is to writing the parse method. You still need a few
helper methods. You will probably want something like the name() and symbol(String) methods from the Recognizer assignment, modified as necessary for the current
assignment.
You need to tokenize the input argument of the parse method. What
you would like is a tokenizer that returns three kinds of things (in addition
to something to mark the end of input): Left parentheses, right parentheses,
and "values" consisting of non-whitespace, non-parenthesis characters. Also,
you will need to be able to "push back" a token that you have read and don't
want.
You have three obvious choices:
StreamTokenizer.
Recognizer assignment.
StringTokenizer.
My approach was the third of these: Use a StringTokenizer, but add pushback
capability to it. Here's the code:
static class PushbackStringTokenizer { private StringTokenizer tokenizer; private String pushedValue = null; PushbackStringTokenizer(String input) { tokenizer = new StringTokenizer(input, "\t\n\r\f ()", true); pushedValue = null; } boolean hasNext() { return pushedValue != null || tokenizer.hasMoreTokens(); } String next() { String temp = pushedValue; if (temp == null && tokenizer.hasMoreTokens()) { temp = tokenizer.nextToken(); } pushedValue = null; return temp; } void pushBack(String token) { pushedValue = token; } }
I've removed the Javadoc from this code in order to make it more readable; when you already know what the methods are supposed to do, the Javadoc just gets in the way. However, notice the following:
next() when there are
no more tokens, it will just return null rather than throw
an exception. I think this makes it easier to write and easier to use. next() method
will return whitespace as tokens, which you don't want. Here are three
possible solutions, not all equally good:
next(), put it in a loop until
you get a non-whitespace token.next() until
it gets a non-whitespace token, then returns it.next() method so that it never returns whitespace.