CIT 594 Creating a Parser for Tree Expressions
Spring 2008, David Matuszek

If you have been working on the assignment, you will have noticed by now that the parse method is over half the work of the assignment. There are many possible ways to write this method; most of them result in horrible code that is extremely difficult to debug. However, if you apply the lessons you (should have) learned from the Recognizer assignment, it is possible to write and debug the method in just a few hours.

The Parser

From the Building Trees assignment:

public static Tree<String> parse(String treeDescription)
Translates a String description of a tree into a Tree<String> object. The treeDescription has the form value(child child ... child), where a value is any sequence of characters not containing parentheses or whitespace, and each child is either just a (String) value or is another treeDescription. Whitespace is optional except where needed to separate values.

For example, the String "one (two three (four five(six seven eight)) )" becomes the Tree:
parsed tree
 

The input string can be described as follows:

<treeDescription> ::= <value> [ "(" { <treeDescription> } ")" ]

That is, a <treeDescription> is either a single <value>, or a <value> followed by any number of <treeDescription>s enclosed in parentheses. The <value> is any sequence of characters not containing parentheses or whitespace (your tokenizer should do this for you).

Using the techniques from the Recognizer assignment, you should be able to write a method to write a method to recognize tree descriptions. Here is some pseudocode to help you out:

Tree parse(input_string) {
    Create a tokenizer for the input_string;
    return parse2(tokenizer);
}


Tree parse2(tokenizer) {
    Get a value from the tokenizer if you can,
        or return null if you can't;
    Make a tree with this value in the root;
    If the next symbol is an open parenthesis {
        While parse2(tokenizer) returns a new tree {
            Add the tree as a child to the root;
        }
        Make sure the next symbol is a close parenthesis;
    }
    Return the root;
}

A Recognizer method would return true or false. Since we want to return a tree when we succeed, we'll use null in place of false.

The red lines above have nothing to do with recognizing the tree description; they are what you need to add to create the tree from the description.

This isn't all there is to writing the parse method. You still need a few helper methods. You will probably want something like the name() and symbol(String) methods from the Recognizer assignment, modified as necessary for the current assignment.

The Tokenizer

You need to tokenize the input argument of the parse method. What you would like is a tokenizer that returns three kinds of things (in addition to something to mark the end of input): Left parentheses, right parentheses, and "values" consisting of non-whitespace, non-parenthesis characters. Also, you will need to be able to "push back" a token that you have read and don't want.

You have three obvious choices:

My approach was the third of these: Use a StringTokenizer, but add pushback capability to it. Here's the code:

static class PushbackStringTokenizer {
    private StringTokenizer tokenizer;
    private String pushedValue = null;
    
    PushbackStringTokenizer(String input) {
        tokenizer = new StringTokenizer(input, "\t\n\r\f ()", true);
        pushedValue = null;
    }
    
    boolean hasNext() {
        return pushedValue != null || tokenizer.hasMoreTokens();
    }
    
    String next() {
        String temp = pushedValue;
        if (temp == null && tokenizer.hasMoreTokens()) {
            temp = tokenizer.nextToken();
        }
        pushedValue = null;
        return temp;
    }
    
    void pushBack(String token) {
        pushedValue = token;
    }
}

I've removed the Javadoc from this code in order to make it more readable; when you already know what the methods are supposed to do, the Javadoc just gets in the way. However, notice the following: