CIT 594 Assignment 4: Parser Addendum
Spring 2004, David Matuszek

Yet another fix to the grammar:

Add this BNF rule:

<program> ::= <command> { <command> } { <procedure> }

That is, a "program" consists of one or more commands, followed by zero or more procedures.

The parse tree for a <program> is as follows: The root node holds a Token whose type is Token.KEYWORD and whose value is the string "program". The left child is a list (type Token.KEYWORD, value "list") of commands; the right child is a similar list of procedures.

<program> ::= <command> { <command> } { <procedure> } penup \n
pendown \n

The root node is a "program" node, and each child is a "list" node. Note that an empty list is represented by a "list" node with no children.

Token values:

Three new "keyword" types are added to the grammar. The Token class should now have a public static final int called Token.KEYWORD, distinct from the other types provided by this class.

The Tokenizer needs to recognize keywords such as if, blue, end, etc., and the newly-added program, and return these with type Token.KEYWORD. The Tokenizer should not recognize "list" or "header" or "program" as keywords; if it encounters these, they should be returned with type Token.NAME.

However, the Token(String value, int type) constructor must be able to create Tokens with these value "list" or "header" and with type Token.KEYWORD. Such tokens will be created by the Parser, not by the Tokenizer.

Building the parse tree:

The interplay between the Stack and the various BinaryTrees is central to this program.

The assignment is very specific about what kind of parse tree is built for each nonterminal type, but it isn't clear about is in the parse tree, or in the stack. Here are the two simple rules you should follow:

  1. The Stack holds BinaryTrees, and only BinaryTrees.

  2. The value field of every BinaryTree must be a Token.

In order to force everything into binary trees (where each node has at most two children), we need to define some intermediate nodes. One of these is the general-purpose list node. The value of this BinaryTree node is a Token, with value = "list" and type = KEYWORD. (Don't let yourself get confused by the fact that both BinaryTrees and Tokens have a value field.)


Note: In these examples, I assume you are using my name(), name(String), number(), and symbol() methods, each of which leaves, if successful, one BinaryTree node on the stack. That BinaryTree node contains the desired Token. I further assume that you have added your own convenience routine, keyword(String).

You don't have to use these methods if you don't like them. You are free to do anything you like about private methods. It's only the public API that must meet the specifications of the assignment. This includes the public instance variable stack (because otherwise testing is much more difficult).

To parse the command "color red \n":

According to the grammar,
     <command> ::= ... | "color" <colorname> <eol> | ...

The Parser's command() method calls keyword("color").
keyword("color") succeeds and leaves a BinaryTree on the stack. The value of this BinaryTree is the appropriate keyword token.
command() now calls colorname() which finds, as a Token, the keyword "red".
colorname() constructs a BinaryTree with this Token as its value, and no children. It pushes the BinaryTree onto the stack.
command() now calls eol() which finds an EOL Token.
I haven't provided eol() for you, but it should leave a BinaryTree on the stack containing an EOL Token.
command() has now found everything it needs to make a command.
It removes the three BinaryTrees from the stack (the eol, the color name, and the color keyword).
command() constructs a new BinaryTree with the "color" node as its value and the "red" node as its left child. The eol node is discarded (ignored).
command() pushes this new BinaryTree onto the stack, and returns true.

Here is what is important to note:

To parse the factor "(3 * n + 1)":

According to the grammar,
     <factor> ::= <name> | <number> | "(" <expression> ")"

The Parser's factor() method calls name(), which fails (returns false).
The stack is unchanged.
The factor() method then calls number(), which also fails.
The stack is unchanged.
The factor() method then calls symbol("("), which succeeds.
The stack now contains a BinaryTree whose value is a Token that represents this symbol.
The factor() method next calls expression(), which succeeds.
The stack now contains two BinaryTrees, the top one representing the <expression> and the next one down representing the "(". (There may be other things deeper in the stack, but these are the only ones factor() cares about.)
The factor() method next calls symbol(")"), which succeeds.
The stack now contains three BinaryTrees.
factor() has now found everything it needs to make a <factor>.
It removes the three BinaryTrees from the stack, discards the ones representing parentheses, and pushes the one representing an <expression> back onto the stack.

Here is my code for doing this. It is slightly different from the above description: Since I know I won't be using the parentheses in the BinaryTree, my code just discards (pops) them as it finds them.

    public boolean factor() {
        if (name())
            return true;
        if (number())
            return true;
        if (symbol("(")) {
            if (!expression())
                error("Error in parenthesized expression");
            if (!symbol(")"))
                error("Unclosed parenthetical expression");
            return true;
        return false;