| CIT
594 Advice for Parser Authors Spring 2006, David Matuszek |
The current assignment (writing a Parser) is probably the most challenging one yet. Here are some things you may find helpful. Not all of this advice is specific to writing parsers, but a lot of it is.
You will never have an EOL (end of line, '\n') in
your parse tree. If you put EOLs on the stack, sooner or later
you'll have to take them off again, and in the meantime they just get in the
way and confuse things.
When you call a parse method, you expect it to build an appropriate tree and put it on the stack. That's fine; test and make sure you built the correct tree. However, you need to be careful that you have removed from the stack all the pieces that went into building that tree.
For example, if you build a block and put it on the stack, it's easy to accidentally
leave the opening bracket ('[') beneath it on the stack. Later
on--maybe much later--this will cause mysterious problems.
The code will look something like this:
In my code I replaced an older
assertEquals(expectedStackSize, parser.stack.size());
assertStackTop(Tree) method, which
checks whether the given Tree is the top thing on the stack, with assertStackIs(Tree),
which checks that the given Tree is the only thing on the stack.
Most of the nodes on your parse tree will have values that came directly from
the input, such as while, define, and red.
However, the parse tree will also contain some values that did not come from
the input, such asprocedures and block (from "[...]").
You can construct these tokens ahead of time and just use them as needed. For
example:
Token parametersToken = new Token(Type.NAME, "parameters");
You can count these new tokens as either NAMEs or KEYWORDs,
whichever seems most convenient to you.
Sure, you put a lot of work into that method, but it never worked properly; or if it did, you never actually used it. It hurts to throw away all that work. But face it--it's just cluttering up your program. Eclipse even tells you, "The method (mumble) is never used locally."
For example, lots of people have had trouble with my makeTree
method. I've had trouble with it. So here's a version that (in my opinion)
is much easier to use--and more general, since now we have varargs.
/** * Removes the designated root and its children from the stack, * creates a new Tree from them, and puts this new Tree on the stack. * * @param rootIndex The index (1-based) on the stack of the root element. * @param childIndices The indices of the children, in the order they * should appear as children. */ private void makeTree(int rootIndex, int... childIndices) { Tree root = get(rootIndex); for (int index : childIndices) { root.addChild(get(index)); } for (int i = 0; i < 1 + childIndices.length; i++) { stack.pop(); } stack.push(root); } /** * Returns the index-th Tree in the global variablestack, * where the index of the top of the stack is 1. * * @param index The index into the stack. * @return The Tree at that index. */ private Tree get(int index) { // Because the Stack inherits get(int) from Vector, we have // to convert the stack index to a Vector index. return stack.get(stack.size() - index); }
Here are some very good reasons to write helper methods:
In your ParserTest class, you need to call the Parser
to build trees, then compare them against the "correct" trees. Where
do these correct trees come from? You have to build them by hand. That can be
an awfully large amount of work. But if you're handy with recursion, you might
come up with a simpler way to build your "expected" trees. Like this:
static Token openBraceToken = new Token(Type.SYMBOL, "{"); static Token closeBraceToken = new Token(Type.SYMBOL, "}"); static int indentCount; /** * Creates a Tree from a given String. The string has the form * "root { subtree1 subtree2 ... subtreeN }", where each subtree * has the same form. * * For example, "+{2 *{3 4}}" yields the tree * <pre> + * / \ * 2 * * / \ * 3 4 </pre> * * @param input The String to turn into a Tree. * @return The newly created Tree. */ private static Tree tree(String input) { assert input != null; LogoTokenizer tokenizer = new LogoTokenizer(input); assert tokenizer.hasNext(); Tree root = new Tree(tokenizer.next()); addChildrenIfAny(root, tokenizer); return root; } private static void addChildrenIfAny(Tree root, Tokenizer tokenizer) { Token token = (Token) tokenizer.next(); if (token.equals(openBraceToken)) { token = (Token) tokenizer.next(); while (!token.equals(closeBraceToken)) { Tree child = new Tree(token); root.addChild(child); addChildrenIfAny(child, tokenizer); token = (Token) tokenizer.next(); } return; } else { // No children tokenizer.putBack(1); return; } }
(Yes, you can use these methods.)
You have to be sure you are building the right trees. The hard part is building the "correct" trees to test against. The above code should help a lot with this.
Example:
use("move 5 \n");
assertTrue(parser.command());
assertStackIs(tree("move{5}"));
I went over my code in class the other day. I won't give you the complete code, but if you would like to do things the same way I did, here's the outline:
public class LogoTokenizer extends Tokenizer implements Iterable { public LogoTokenizer(String inputString); // Constructor static boolean isKeyword(String word); @Override public Object next(); public Iterator iterator(); private class LogoIterator implements Iterator { LogoIterator(String string); // Constructor public boolean hasNext(); public Object next(); public void remove(); } }