| CIT 594 Assignment
7: Parser for Robot Language CIT 594, Spring 2005 |
In this assignment you will continue to build on the previous assignments.
You will need your Tree class from assignment 5, your Recognizer
class from assignment 6, and all applicable JUnit test classes. You will not
use your ArithmeticExpression from assignment 4.
Your assignment is to write a parser for programs in the "robot language". A parser is a program that, given a stream of tokens, creates a Tree that represents the structure of the program represented by that stream of tokens. Such a tree is called a parse tree. The meaning of the program, and its parse tree, is outside the scope of this assignment.
In assignment 6 you wrote a simple "decorator class" to stand in
front of java.util.StringTokenizer and add some features to it--namely,
the ability to "push back" one token, and the ability to classify
each token as one of the following:
"\n")If you have not already done so, you may find it helpful to add the following optional classifications:
hasMoreTokens()."$"
or "123abc". Or you can just assume that you won't
get such tokens.If you add one or both of these, be sure to update yourJUnit tests as well.
Some students have used the Tokenizer just to return Strings, and assigned
a type to each token in the
constructor. After thinking about it a bit, I've decided that this is better
than the way I've been doing it--the Token class can ensure that
its instances are valid, and this simplifies the Tokenizer. (However, either
way works, and there's no need to change things now.)
To the definition of <command>, add the alternative
"do"
<name> { <expression> } <eol>
and, of course, add a suitable JUnit test for this alternative.
None. Use the Tree class exactly as defined in the previous assignment.
You can fix any bugs you find, but if you feel you need some additional methods,
put them someplace other than in this class. For example, you may decide you
need some simplified methods for building the "expected" trees when
you do JUnit testing.
First, so that you don't lose your previous work, make a copy of your
Recognizer class, and change its name to Parser.
(You can do this easily with Eclipse's "Rename" refactoring.) You
will be transforming this class from a recognizer, which simply returns
true or false, into a parser, which creates
a tree corresponding to its input. We will use this parse tree in a later,
final assignment.
One way of changing a Recognizer to a Parser is to modify each method to return
a Tree instead of a boolean. Don't do this.
There is an easier way that allows you to make smaller changes and do unit
testing as you go. That easier way is to keep a "global" Stack
of trees as you go. The methods you adapt from the Recognizer should still return
boolean values, but you can add a method or methods to get the
contents of the stack.
For example, consider the grammar rule
| In the Recognizer: | In the Parser: |
|---|---|
public boolean expression() {
if (!term())
return false;
if (!addOperator())
return true;
if (!expression())
error("Error in expression");
return true;
}
|
public boolean expression() {
if (!term()) // Note 1
return false;
if (!addOperator()) // Note 2
return true;
if (!expression()) // Note 3
error("Error in expression");
makeTree(2, 1, 3); // Note 4
return true;
}
|
|
Notes:
|
|
The above translation from Recognizer method to Parser
method, while correct, builds a tree of the wrong shape for our purposes. As
discussed in class, we need to use the equivalent version of the grammar rule
that was defined in the Recognizer
assignment:
<expression> ::= <term> { <add_operator> <term>
}
| In the Recognizer: | In the Parser: |
|---|---|
public boolean expression() {
if (!term())
return false;
while (addOperator()) {
if (!term()) {
error("Error in expression");
}
}
return true;
}
|
public boolean expression() {
if (!term()) // Note 1, as above
return false;
while (addOperator()) { // Note 2
if (term()) {
makeTree(2, 1, 3); // Note 4
}
else {
error("Error in expression");
}
}
return true;
}
|
First, you should realize that not everything goes into the parse tree. Some things, such parentheses and ends-of-lines, are there to help define the structure or as an aid to the Recognizer. Most everything else does go into the parse tree, though.
In the following table I will attempt to show what goes into the parse tree for each grammar rule or rule alternative. Where a rule has more than one alternative, the first column contains only the alternative that is used in the example.
| Grammar rule (partial) | Example input |
What it should push onto the stack
|
Comments |
|---|---|---|---|
<variable> ::= <name> |
foo |
|
Leaf node, value is the name. |
<object> ::= <name> |
foo |
|
Leaf node, value is the name. |
<command> ::= |
forward 5\n |
![]() |
The keyword is the root, and the first child is the
Tree that represents the <expression>; it may be significantly
more complicated than this. |
<command> ::= |
turn right \n |
![]() |
The keyword is the root, and the argument is its child. |
<command> ::= |
take rock \n |
![]() |
The keyword is the root, and the argument is its child. |
<command> ::= |
set x 5 \n |
![]() |
The first child is the Tree that represents the <variable>,
while the second child is the Tree that represents the (possibly very complex)
<expression>. |
<command> ::= |
|
![]() |
The first child is the Tree that represents the <expression>,
while the second child is the Tree that represents the <block>.
(The <block> subtree is not shown in detail.) |
<command> ::= |
|
![]() |
The first child is the Tree that represents the <condition>,
(shown explicitly here) while the second child is the Tree that represents
the <block>. (The <block> subtree
is not shown in detail.) |
<command> ::= |
if x < 5 {\n |
![]() |
The first child represents the <condition>,
while the second child represents the "then part".
If there were an "else part," it would be the third child. |
<command> ::= |
|
![]() |
The first child is the procedure name, while the remaining children are expressions. |
<command> ::= "stop" <eol>
|
stop \n |
|
Leaf node, value is the keyword. |
<block> ::= "{"
<eol> |
{ \n |
![]() |
A block has zero or more children, each of which is a command. There is no "block" keyword in the grammar, but you can invent one for use in the Tree. |
<expression> ::= <term> |
foo + 5 |
![]() |
When an |
<expression> ::= <term> |
2 + 3 + 6 |
![]() |
The shape of the tree is very important; it says 2+3
must be done first. |
<term> ::= <factor> |
foo * 5 |
![]() |
When a <term> consists of a single factor,
the tree is the same as it is for that <factor> alone |
<factor> ::= <name> |
foo |
|
When a <factor> consists of a name or number,
the tree is the same as it is for that <name> or <number>. |
<factor> ::= "("
<expression> ")" |
(foo + 5) |
![]() |
When a <factor> consists of parentheses
around an expression, the tree is the same as it is for that
<expression>. |
<condition> ::= <expression> <comparator>
<expression> |
x < 5 |
![]() |
The children of a comparator node are arbitrarily complex
trees representing <expression>s. Only simple expressions
are used in this picture. |
<comparator> ::= "<"
| "="
| ">" |
< |
|
Leaf node, value is the symbol. Similarly for the two other comparators. |
<procedure> ::= "to"
<name> |
to foo x y\n |
![]() |
This is the most complex structure. A |
<program> ::= <command> { <command>
} |
forward 5\n |
![]() |
The root node is a "program" node. The first child is an implicit "block" node of commands, and the remaining children are the procedures (if any). |
<eol> ::= "\n"
|
\n |
|
No tree is built. If something was put on the stack in the
process of recognizing the <eol>, it should be removed. |
At this point there is no "main" nonterminal--you should, for example,
be able to make calls like
Parser p = new Parser("x<5");
if(p.condition()) {...}
or
Parser p = new Parser("set x x+1");
Later we will be using program as the "main" nonterminal, but not
yet.
Do JUnit testing of your parser methods. This will involve not only testing for the correct boolean result, but also testing whether the correct tree was made. It might be a good idea to think about the best way to construct all the trees you will need for JUnit testing.
Thursday, March 24, before midnight.