CIT 594 Assignment 7: Parser for Robot Language
CIT 594, Spring 2005

Purposes:

General Idea:

In this assignment you will continue to build on the previous assignments. You will need your Tree class from assignment 5, your Recognizer class from assignment 6, and all applicable JUnit test classes. You will not use your ArithmeticExpression from assignment 4.

Your assignment is to write a parser for programs in the "robot language". A parser is a program that, given a stream of tokens, creates a Tree that represents the structure of the program represented by that stream of tokens. Such a tree is called a parse tree. The meaning of the program, and its parse tree, is outside the scope of this assignment.

Changes to the Tokenizer:

In assignment 6 you wrote a simple "decorator class" to stand in front of java.util.StringTokenizer and add some features to it--namely, the ability to "push back" one token, and the ability to classify each token as one of the following:

If you have not already done so, you may find it helpful to add the following optional classifications:

If you add one or both of these, be sure to update yourJUnit tests as well.

Some students have used the Tokenizer just to return Strings, and assigned a type to each token in the new Token(String) constructor. After thinking about it a bit, I've decided that this is better than the way I've been doing it--the Token class can ensure that its instances are valid, and this simplifies the Tokenizer. (However, either way works, and there's no need to change things now.)

Changes to the Recognizer:

To the definition of <command>, add the alternative
     "do" <name> { <expression> } <eol>

and, of course, add a suitable JUnit test for this alternative.

Changes to the Tree class:

None. Use the Tree class exactly as defined in the previous assignment. You can fix any bugs you find, but if you feel you need some additional methods, put them someplace other than in this class. For example, you may decide you need some simplified methods for building the "expected" trees when you do JUnit testing.

The Parser class:

First, so that you don't lose your previous work, make a copy of your Recognizer class, and change its name to Parser. (You can do this easily with Eclipse's "Rename" refactoring.) You will be transforming this class from a recognizer, which simply returns true or false, into a parser, which creates a tree corresponding to its input. We will use this parse tree in a later, final assignment.

One way of changing a Recognizer to a Parser is to modify each method to return a Tree instead of a boolean. Don't do this. There is an easier way that allows you to make smaller changes and do unit testing as you go. That easier way is to keep a "global" Stack of trees as you go. The methods you adapt from the Recognizer should still return boolean values, but you can add a method or methods to get the contents of the stack.

For example, consider the grammar rule
<expression> ::= <term> | <term> <add_operator> <expression>:

In the Recognizer: In the Parser:
public boolean expression() {
    if (!term())
        return false;
    if (!addOperator())
        return true;
    if (!expression())
        error("Error in expression");
    return true;
}
public boolean expression() {
    if (!term())         // Note 1
        return false;
    if (!addOperator())  // Note 2
        return true;
    if (!expression())   // Note 3
        error("Error in expression");
    makeTree(2, 1, 3);   // Note 4
    return true;
}

Notes:

  1. If term() succeeds, it leaves a Tree representing that term on the stack.
  2. If addOperator() succeeds, it leaves a Tree leaf node on the stack, containing the add operator ("+" or "-").
  3. If expression() succeeds, it leaves a Tree representing that expression on the stack.
  4. If we get here, my makeTree method removes the three preceeding Trees from the stack, creates a new one that represents this expression, and leaves it on the stack. I don't care if you use makeTree, but you should definitely use some kink of helper methods to manipulate trees in the stack.

The above translation from Recognizer method to Parser method, while correct, builds a tree of the wrong shape for our purposes. As discussed in class, we need to use the equivalent version of the grammar rule that was defined in the Recognizer assignment:
<expression> ::= <term> { <add_operator> <term> }

In the Recognizer: In the Parser:
public boolean expression() {
    if (!term())
        return false;
    while (addOperator()) {
        if (!term()) {
            error("Error in expression");
        }
    }
    return true;
}
public boolean expression() {
    if (!term())               // Note 1, as above
        return false;
    while (addOperator()) {    // Note 2
        if (term()) {
            makeTree(2, 1, 3); // Note 4
        }
        else {
            error("Error in expression");
        }
    }
    return true;
}

Making parse trees:

First, you should realize that not everything goes into the parse tree. Some things, such parentheses and ends-of-lines, are there to help define the structure or as an aid to the Recognizer. Most everything else does go into the parse tree, though.

In the following table I will attempt to show what goes into the parse tree for each grammar rule or rule alternative. Where a rule has more than one alternative, the first column contains only the alternative that is used in the example.

Grammar rule (partial) Example input
What it should push onto the stack
Comments
<variable> ::= <name> foo
Leaf node, value is the name.
<object> ::= <name> foo
Leaf node, value is the name.
<command> ::=
<move> <expression> <eol>
forward 5\n
The keyword is the root, and the first child is the Tree that represents the <expression>; it may be significantly more complicated than this.
<command> ::=
"turn" <direction> <eol>
turn right \n
The keyword is the root, and the argument is its child.
<command> ::=
"take" <object> <eol> |
"drop" <object> <eol>
take rock \n
The keyword is the root, and the argument is its child.
<command> ::=
 "set" <variable> <expression> <eol>
set x 5 \n
The first child is the Tree that represents the <variable>, while the second child is the Tree that represents the (possibly very complex) <expression>.
<command> ::=
 "repeat" <expression> <block> <eol>
repeat 5 {\n
  turn left\n
  stop \n
}\n
The first child is the Tree that represents the <expression>, while the second child is the Tree that represents the <block>. (The <block> subtree is not shown in detail.)
<command> ::=
 "while" <condition> <block> <eol>
while x<5{\n
  turn left\n
}\n
The first child is the Tree that represents the <condition>, (shown explicitly here) while the second child is the Tree that represents the <block>. (The <block> subtree is not shown in detail.)
<command> ::=
 "if" <condition> <block>
[ "else" <block> ]
if x < 5 {\n
  stop \n
}\n
The first child represents the <condition>, while the second child represents the "then part". If there were an "else part," it would be the third child.
<command> ::=
"do" <name> { <expression> } <eol>
do foo 2 3 6\n
The first child is the procedure name, while the remaining children are expressions.
<command> ::= "stop" <eol> stop \n
Leaf node, value is the keyword.
<block> ::= "{" <eol> { <command> } "}" <eol> { \n
turn left\n
forward 5\n
turn right\n
} \n

A block has zero or more children, each of which is a command.

There is no "block" keyword in the grammar, but you can invent one for use in the Tree.

<expression> ::= <term> { <add_operator> <term> } foo + 5

When an <expression> consists of a single term, the tree is the same as it is for that <term> alone.

<expression> ::= <term> { <add_operator> <term> } 2 + 3 + 6
The shape of the tree is very important; it says 2+3 must be done first.
<term> ::= <factor> { <multiply_operator> <factor> } foo * 5
When a <term> consists of a single factor, the tree is the same as it is for that <factor> alone
<factor> ::= <name> foo
When a <factor> consists of a name or number, the tree is the same as it is for that <name> or <number>.
<factor> ::= "(" <expression> ")" (foo + 5)
When a <factor> consists of parentheses around an expression, the tree is the same as it is for that <expression>.
<condition> ::= <expression> <comparator> <expression> x < 5
The children of a comparator node are arbitrarily complex trees representing <expression>s. Only simple expressions are used in this picture.
<comparator> ::= "<" | "=" | ">" <
Leaf node, value is the symbol. Similarly for the two other comparators.
<procedure> ::= "to" <name> { <variable> } <eol>
{ <command> } "end" <eol>
to foo x y\n
  forward x\n
  turn left \n
  back y\n
end \n

This is the most complex structure. A <procedure> requires a "header" consisting of a <name> and an implicit "block" of <command>s.

<program> ::= <command> { <command> } { <procedure> } forward 5\n
turn left\n
back 1\n

The root node is a "program" node. The first child is an implicit "block" node of commands, and the remaining children are the procedures (if any).

<eol> ::= "\n" { "\n" } \n

 

 

No tree is built. If something was put on the stack in the process of recognizing the <eol>, it should be removed.

At this point there is no "main" nonterminal--you should, for example, be able to make calls like
     Parser p = new Parser("x<5");
     if(p.condition()) {...}

or
     Parser p = new Parser("set x x+1");
     
if(p.command()) {...}.
Later we will be using program as the "main" nonterminal, but not yet.

Do JUnit testing of your parser methods. This will involve not only testing for the correct boolean result, but also testing whether the correct tree was made. It might be a good idea to think about the best way to construct all the trees you will need for JUnit testing.

Due date:

Thursday, March 24, before midnight.