CIT 594 Advice for Parser Authors
Spring 2006, David Matuszek

The current assignment (writing a Parser) is probably the most challenging one yet. Here are some things you may find helpful. Not all of this advice is specific to writing parsers, but a lot of it is.

Never put an EOL on the stack

You will never have an EOL (end of line, '\n') in your parse tree. If you put EOLs on the stack, sooner or later you'll have to take them off again, and in the meantime they just get in the way and confuse things.

Test for what isn't on the stack

When you call a parse method, you expect it to build an appropriate tree and put it on the stack. That's fine; test and make sure you built the correct tree. However, you need to be careful that you have removed from the stack all the pieces that went into building that tree.

For example, if you build a block and put it on the stack, it's easy to accidentally leave the opening bracket ('[') beneath it on the stack. Later on--maybe much later--this will cause mysterious problems.

The code will look something like this:

assertEquals(expectedStackSize, parser.stack.size());

In my code I replaced an older assertStackTop(Tree) method, which checks whether the given Tree is the top thing on the stack, with assertStackIs(Tree), which checks that the given Tree is the only thing on the stack.

Create extra tokens

Most of the nodes on your parse tree will have values that came directly from the input, such as while, define, and red. However, the parse tree will also contain some values that did not come from the input, such asprocedures and block (from "[...]"). You can construct these tokens ahead of time and just use them as needed. For example:

Token parametersToken = new Token(Type.NAME, "parameters");

You can count these new tokens as either NAMEs or KEYWORDs, whichever seems most convenient to you.

Get rid of old code

Sure, you put a lot of work into that method, but it never worked properly; or if it did, you never actually used it. It hurts to throw away all that work. But face it--it's just cluttering up your program. Eclipse even tells you, "The method (mumble) is never used locally."

If it's confusing, replace it

For example, lots of people have had trouble with my makeTree method. I've had trouble with it. So here's a version that (in my opinion) is much easier to use--and more general, since now we have varargs.

    /**
     * Removes the designated root and its children from the stack,
     * creates a new Tree from them, and puts this new Tree on the stack.
     * 
     * @param rootIndex The index (1-based) on the stack of the root element. 
     * @param childIndices The indices of the children, in the order they
     * should appear as children.
     */
    private void makeTree(int rootIndex, int... childIndices) {
        Tree root = get(rootIndex);
        for (int index : childIndices) {
            root.addChild(get(index));
        }
        for (int i = 0; i < 1 + childIndices.length; i++) {
            stack.pop();
        }
        stack.push(root);
    }
	
    /**
     * Returns the index-th Tree in the global variable stack,
     * where the index of the top of the stack is 1.
     * 
     * @param index The index into the stack.
     * @return The Tree at that index.
     */
    private Tree get(int index) {
        // Because the Stack inherits get(int) from Vector, we have
        // to convert the stack index to a Vector index.
        return stack.get(stack.size() - index);
    }

Write helper methods

Here are some very good reasons to write helper methods:

  1. You need to do the same thing in multiple places, so Don't Repeat Yourself.
  2. It was hard to get the code right, so isolate it in a method you can test thoroughly.
  3. Your method was too long to see all at once, but you could break it up into meaningful chunks.

In your ParserTest class, you need to call the Parser to build trees, then compare them against the "correct" trees. Where do these correct trees come from? You have to build them by hand. That can be an awfully large amount of work. But if you're handy with recursion, you might come up with a simpler way to build your "expected" trees. Like this:

    static Token openBraceToken = new Token(Type.SYMBOL, "{");
    static Token closeBraceToken = new Token(Type.SYMBOL, "}");
    static int indentCount;
    
    /**
     * Creates a Tree from a given String. The string has the form
     * "root { subtree1 subtree2 ... subtreeN }", where each subtree
     * has the same form.
     * 
     * For example, "+{2 *{3 4}}" yields the tree
     * <pre>    +
     *         / \
     *        2   *
     *           / \
     *          3   4  </pre>
     * 
     * @param input The String to turn into a Tree.
     * @return The newly created Tree.
     */
    private static Tree tree(String input) {
        assert input != null;
        LogoTokenizer tokenizer = new LogoTokenizer(input);
        assert tokenizer.hasNext();
        Tree root = new Tree(tokenizer.next());
        addChildrenIfAny(root, tokenizer);
        return root;
    }
    
    private static void addChildrenIfAny(Tree root, Tokenizer tokenizer) { 
        Token token = (Token) tokenizer.next();
        if (token.equals(openBraceToken)) {
            token = (Token) tokenizer.next();
            while (!token.equals(closeBraceToken)) {
                Tree child = new Tree(token);
                root.addChild(child);
                addChildrenIfAny(child, tokenizer);
                token = (Token) tokenizer.next();
            }
            return;
        }
        else { // No children
            tokenizer.putBack(1);
            return;
        }
    }

(Yes, you can use these methods.)

Do thorough JUnit testing

You have to be sure you are building the right trees. The hard part is building the "correct" trees to test against. The above code should help a lot with this.

Example:

    use("move 5 \n");
    assertTrue(parser.command());
    assertStackIs(tree("move{5}"));

The LogoTokenizer class

I went over my code in class the other day. I won't give you the complete code, but if you would like to do things the same way I did, here's the outline:

 public class LogoTokenizer extends Tokenizer implements Iterable { 

    
     public LogoTokenizer(String inputString); // Constructor
    
     static boolean isKeyword(String word);
    
     @Override
     public Object next();

     public Iterator iterator();
    
     private class LogoIterator implements Iterator {
        
         LogoIterator(String string); // Constructor
        
         public boolean hasNext();

         public Object next();

         public void remove();
     }
 }