CIT 594 Assignment 8: Parser
Spring 2012, David Matuszek

Purposes

General Idea

Your assignment is to write a parser for programs in the language described by the Grammar in the Recognizer assignment. A parser is a program that, given a stream of tokens, creates a tree that represents the structure of the program represented by that stream of tokens. Such a tree is called a parse tree. The "meaning" of this tree, if any, is outside the scope of this assignment.

In this assignment you will continue to build on the previous assignments. To have everything in one place (this will simplify submitting your assignment), create a new RobotParser project, with the following packages:

The Parser class

I am providing parserStarterCode.zip, which you should unzip and incorporate into your new Parser class, replacing the same-named methods in that class. This code can parse <expression>s and the other nonterminals that go to make up expressions. You should read and become familiar with all the provided starter code.

The Recognizer/Parser methods all return a boolean. Don't change this. The Parser methods will manipulate a global Stack<Tree<Token>>, but will continue to return a boolean result.

For example, consider the grammar rule <expression> ::= <term> { <add_operator> <term> }:

In the Recognizer: In the Parser:
public boolean term() {
    if (!factor()) return false;
    while (multiplyOperator()) {
        if (!factor()) {
            error("No term after '*' or '/' or '%'");
        }
    }
    return true;
}
public boolean term() {
    if (!factor()) return false;     // Note 1
    while (multiplyOperator()) {     // Note 2
        swapTopTwoElementsOfStack(); // Note 3
        makeTree();
        if (!factor()) {             // Note 4
            error("No term after '*' or '/' or '%'");
        }
        else {
            makeTree();              // Note 5
        }
    }                                // Note 6
    return true;
}

Notes:

  1. If the first factor() succeeds, it leaves a Tree representing that factor on the stack.
  2. If multiplyOperator() succeeds, it leaves a Tree leaf node on the stack, containing the multiply operator ("*" or "/" or "%").
  3. Make a tree, with the multiplyOperator at the root, and the factor as a child.
  4. The second factor() should succeed. If it does, it leaves a Tree representing that factor on the stack.
  5. Adds the second factor as a child of the multiply operator.
  6. If the loop continues with another multiply operator, the tree created above will be a subtree of it.

Making parse trees

The general idea is that as you recognize things in the input, you build Trees to represent them, and push these Trees onto a stack. As you go along, you pop Trees from the stack, combine them into a larger Tree, and put this new Tree onto the stack. When you are done, you should have exactly one Tree on the stack, representing the complete nonterminal being parsed.

For example, suppose you are trying to parse a <condition> (defined as <expression> <comparator> <expression>) and the string being parsed is "x <= 5 {".

First, it is important to realize that not everything goes into the parse tree. Some tokens, such parentheses, braces, and semicolons, are there to help define the tree structure, or as an aid to the recognizer. Some nonterminals, such as <expression> and <term>, are in the grammar mostly to define the structure that the resulting tree should have. In general, anything that provides structure is represented in the structure of the tree; anything that content is represented as nodes with values.

Second, there may be things in the parse tree that are not explicit in the original program. For example, when you create a procedure node, one of its children will be a parameters node to hold the list of formal parameters.

In the following table I will attempt to show what goes into the parse tree for each grammar rule or rule alternative. Where a rule has more than one alternative, the first column contains only the alternative that is used in the example. Note that all node types have a fixed number of children, with the exception of <list> and <block>, each of which may have any number (including zero) of children.

If there are discrepancies between the grammar given in the Recognizer assignment, and that shown below, the one in the Recognizer assignment is the correct one.

Grammar rule (partial) Example input
What it should push onto the stack
Comments
<variable> ::= <name> foo
variable
Leaf node, value is the name.
<thing> ::= <name> foo
variable
Leaf node, value is the name.
<command> ::=
<thought> | <action>
set foo 5; command A command is either a <thought> or an <action>, so whatever results from creating one of these is the result of creating a <command>.
<action> ::=
<move> <expression> ";"
forward 5;
move
The keyword is the root, and the first child is the Tree that represents the <expression>; it may be significantly more complicated than this.
<action> ::=
"turn" <direction> ";"
turn right;
turn
The keyword is the root, and the argument is its child.
<action> ::=
"take" <thing> ";" |
"drop" <thing> ";"
take rock ;
take
The keyword is the root, and the argument is its child.
<thought> ::=
 "set" <variable> <expression> ";"
set foo 5;
command
The first child is the Tree that represents the <variable>, while the second child is the Tree that represents the (possibly very complex) <expression>.
<thought> ::=
 "repeat" <expression> <block>
repeat 5 {
  turn left;
  stop;
}
repeat
The first child is the Tree that represents the <expression>, while the second child is the Tree that represents the <block>. (The <block> subtree is not shown in detail.)
<thought> ::=
 "while" <condition> <block>
while x < 5 {
  turn left;
}
while
The first child is the Tree that represents the <condition>, (shown explicitly here) while the second child is the Tree that represents the <block>. (The <block> subtree is not shown in detail.)
<thought> ::=
 "if" <condition> <block>
[ "else" <block> ]
if x < 5 {
  stop;
}
if
The first child represents the <condition>, while the second child represents the "then part". If there were an "else part," it would be the third child.
<thought> ::=
"call" <name> { <expression> } ";"
call foo 2 3 6;
do
The first child is the procedure name, while the remaining children are expressions.
<action> ::= "stop" ";" stop ;
stop
Leaf node, value is the keyword.
<block> ::= "{"
    { <command> }
"}"
{
turn left;
forward 5;
turn right;
}
block

A block has zero or more children, each of which is a command.

There is no "block" keyword in the grammar, but when you build the tree, create a block token of type KEYWORD. This will not affect the parsing, but will be used in the tree in place of {...}.

<expression> ::=
[ <addOperator> ]
<term> { <add_operator> <term> }
foo + 5
expression

When an <expression> consists of a single term, the tree is the same as it is for that <term> alone.

<expression> ::=
[ <addOperator> ]
<term> { <add_operator> <term> }
2 + 3 + 6
long-expression
The shape of the tree is very important; it says 2+3 must be done first. Otherwise, what would be be adding to 6?
<expression> ::=
[ <addOperator> ]
<term> { <add_operator> <term> }
-5 unary A unary plus or minus has an expression as its child.
<term> ::= <factor> { <multiply_operator> <factor> } foo * 5
term
When a <term> consists of a single factor, the tree is the same as it is for that <factor> alone
<factor> ::= <name> foo
variable
When a <factor> consists of a name or number, the tree is the same as it is for that <name> or <number>.
<factor> ::= "(" <expression> ")" (foo + 5)
expression
When a <factor> consists of parentheses around an expression, the tree is the same as it is for that <expression>. The parentheses help the parser determine the shape of the tree, but they don't form a part of the tree.
<condition> ::= "seeing" <thing> seeing rock seeing rock The <thing> is a child of the seeing or holding node. For not, the child is a <condition> of some sort.
<condition> ::= <expression> <comparator> <expression> x < 5
condition
The children of a comparator node are arbitrarily complex trees representing <expression>s. Only simple expressions are used in this picture.
<comparator> ::=
"<" | "<="| "==" |
"!=" | ">="| ">"
<
comparator
Leaf node, value is the symbol. Similarly for the other comparators.
<procedure> ::=
"def" <name> { <variable> } <block>
def foo x y {
  forward x;
  turn left ;
  back y;
}
procedure

This is the most complex structure. A <procedure> requires a "header" consisting of a <name> and parameters, and a block of <command>s.

<program> ::= "program" <block> { <procedure> } program {
  forward 5;
  turn left;
  back 1;
}
program

The root node is a program node. The first child is a block of commands, and the remaining children are the procedures (if any).

Correct tree shapes

Note that you do not evaluate expressions in the current assignment! You just need to build the parse tree correctly, so that it could be evaluated.

An operation cannot be performed until its operands are known. When an operation is represented by a node in a binary tree, that operation cannot be performed until the children of that node have been evaluated. (This requires a postorder tree traversal.) Therefore, the structure of the binary tree imposes restrictions on the order in which operations are performed.

For example, consider the expression 20 * 3 / 4. In this expression, it is important to do the multiplication before doing the division. Our parser might construct either of two trees from this expression, only one of which is correct.

This is the correct representation of 20 * 3 / 4 --if you evaluate the left and right children, then do the division, the result is (20 * 3) / 4 = 60 / 4 = 15.

This is a wrong representation--if you evaluate the left and right children, then do the division, .the result is 20 * (3 / 4) = 20 * 0 = 0. (Remember, this is integer division.)

Due date

Your program is due before 6am Tuesday, March 27. Zip up the entire directory for this project, and submit via Blackboard. Only assignments submitted via Blackboard will be accepted--any assignments emailed to me will be discarded without comment. A late penalty of 5 points per day (out of 100 points) will apply.

Because many of you are interviewing this semester, a limited number of 48-hour extensions will be available. To get an extension, email me before 5pm Friday, stating the reason you need the extension. No extensions will be granted after Friday. If you get an extension and fail to get the project in by the extended due date, late points will be counted from the original due date.