CIT 594 Assignment 7:
Parser
Spring 2009, David Matuszek
Your assignment is to write a parser for programs in the language described by the Grammar in the Recognizer assignment. A parser is a program that, given a stream of tokens, creates a tree that represents the structure of the program represented by that stream of tokens. Such a tree is called a parse tree. The "meaning" of this tree, if any, is outside the scope of this assignment.
In this assignment you will continue to build on the previous assignments.
You should leave your source code files in their original projects, and use
import statements so you can access them.
You will need the Tokenizer, Tree, and
Recognizer projects. You should create an
AllTests.java JUnit class in the Parser project to
include all the previous JUnit tests along with your new Parser
tests.
Tokenizer:If you have not already done so, you should update your list (or array) of
keywords in the Tokenizer. A keyword is any of the words defined in the
grammar--move, if, red,
etc.
This change should not break your Tokenizer class,
with the sole exception that you will probably have to update your tests for
keywords. Use your JUnit tests to make sure that Tokenizer still
works correctly. If you uncover any errors in your Tokenizer,
first write JUnit tests to catch those errors, then add the
corrections to Tokenizer.
Parser class:I am providing starter code for Parser.java
and ParserTest.java. This code can
parse <expression>s and the other nonterminals that go
to make up expressions. You should read and become familiar with all
the provided starter code.
You will need to merge your Recognizer with this starter
code, keeping the most complete versions of each method. You will be
transforming your recognizer, which simply returns true or
false, into a parser, which creates a tree corresponding
to its input. We will use this parse tree later in this course.
The Recognizer methods all return a boolean.
Don't change this. The Parser methods
will manipulate a global Stack<Tree<Token>>,
but will continue to return a boolean result.
For example, consider the grammar rule
<expression> ::= <term> { <add_operator> <term>
}:
| In the Recognizer: | In the Parser: |
|---|---|
public boolean expression() {
if (!term()) return false;
while (addOperator()) {
if (!term()) {
error("Error after '+' or '-'");
}
return true;
}
|
public boolean expression() {
if (!term()) return false; // Note 1
while (addOperator()) { // Note 2
if (!term()) { // Note 3
error("Error after '+' or '-'");
// Add code here // Note 4
}
return true;
}
|
|
Notes:
|
|
The general idea is that as you recognize things in the input, you build Trees to represent them, and push these Trees onto a stack. As you go along, you pop Trees from the stack, combine them into a larger Tree, and put this new Tree onto the stack. When you are done, you should have exactly one Tree on the stack, representing the complete nonterminal being parsed.
For example, suppose you are trying to parse a
<condition> (defined as <expression>
<comparator> <expression>) and the string being parsed is
"x < 5 { \n". You will parse the x as an
<expression> by calling the expression()
method, which will leave a Tree representing this expression on the stack.
You do similar things for the <comparator> and the second
<expression>. Then you will pop these three Trees from the
stack, combine them into a single tree (with the "<" Tree
node as the root), and put the result back on the stack. The remainder of the
input string (in this case, "{ \n") is left for later
use.
First, it is important to realize that not everything goes into the
parse tree. Some tokens, such parentheses, braces, and newlines, are there to
help define the tree structure, or as an aid to the recognizer. Some
nonterminals, such as <expression> and
<term>, are in the grammar mostly to define the structure
that the resulting tree should have. In general, anything that provides
structure is represented in the structure of the tree; anything that
provides content is represented as nodes with values.
Second, there may be things in the parse tree that are not explicit in the original program. For example, when you create a program node, one of its children is a block to hold the statements in the program (even though the grammar does not call for a block), and another is a parameters node to hold the list of formal parameters.
In the following table I will attempt to show what goes into the parse
tree for each grammar rule or rule alternative. Where a rule has more than
one alternative, the first column contains only the alternative that is used
in the example. Note that all node types have a fixed number of children,
with the exception of <list> and
<block>, each of which may have any number (including
zero) of children.
If there are discrepancies between the grammar given in the Recognizer assignment, and that shown below, the one in the Recognizer assignment is the correct one.
| Grammar rule (partial) | Example input |
What it should push onto the stack
|
Comments |
|---|---|---|---|
<variable> ::= <NAME> |
"foo" |
|
Leaf node, value is the name. |
<command> ::= |
"forward 5 \n" |
|
The (one) child is the Tree that represents the
<expression>; the expression may be significantly more
complicated than this. |
<command> ::= |
"left 45\n" |
The (one) child is the Tree that represents the
<expression>. |
|
<command> ::= |
"penup \n" |
|
Leaf node, value is the keyword. |
<command> ::= |
"pendown \n" |
|
Leaf node, value is the keyword. |
<command> ::= |
"color red \n" |
|
The child is the Tree that represents the
<color>. |
<command> ::= |
"home \n" |
|
Leaf node, value is the keyword. |
<command> ::= |
"set x 5 \n" |
|
The first child is the Tree that represents the
<variable>, while the second child is the Tree that
represents the (possibly complex) <expression>. |
<command> ::= |
"repeat 5 { \n |
|
The first child is the Tree that represents the
<expression>, while the second child is the Tree that
represents the <block>. |
<command> ::= |
"while x < 5 { \n |
|
The first child is the Tree that represents the
<condition>, while the second child is the Tree that
represents the <block>. |
<command> ::= |
"if x < 5 {\n |
|
The first child is the Tree that represents the
<condition>, while the second child is the Tree that
represents the <block>. |
<command> ::= |
|
|
The first child is the Tree that represents the
<condition>, the second child is the Tree that
represents the first <block>, and the third child
represents the second <block>. |
<command> ::= |
"call foo 1 2 3\n" |
|
The (required) first child of a call command is a name,
while the second child is a (possibly empty) list of expressions. |
<color> ::= "red" |
"red" |
|
Leaf node, value is the keyword. Similarly for each other color name. |
<block> ::= "{"
<end> { <command> } "}"
<end> |
|
|
Each command is a child (subtree) of the block node. (To keep the diagram small, I used only simple commands; each command might be a large, complex subtree.) |
<comparator> ::= "<" |
"=" | ">" |
"<" |
|
Leaf node, value is the symbol. Similarly for the two other comparators. |
<condition> ::= <expression>
<comparator> <expression> |
"x < 5" |
|
The children of a comparator node are arbitrarily complex
binary trees representing <expression>s. Only simple
expressions are used in this picture. |
<procedure> ::= "define" <name> {
<variable> } <block> |
"define foo x y z { \n |
|
This is the most complex structure. A A "header" node is the parent of the |
<program> ::= <command> {
<command> } { <procedure> } |
"penup \n |
|
The root node is a "program" node, and each child is a "list" node. Note that an empty list is represented by a "list" node with no children. |
<end> ::= "\n" { "\n"
} |
"\n" |
|
No binary tree is built. If something was put on the
stack in the process of recognizing the <eol>, it
should be removed. |
An operation cannot be performed until its operands are known. When an operation is represented by a node in a binary tree, that operation cannot be performed until the children of that node have been evaluated. (This requires a postorder tree traversal.) Therefore, the structure of the binary tree imposes restrictions on the order in which operations are performed.
For example, consider the expression 20 * 3 / 4. In this
expression, it is important to do the multiplication before doing the
division. Our parser might construct either of two trees from this
expression, only one of which is correct.
![]() |
This is the correct representation--if you evaluate the left
and right children, then do the division, the result is (20 * 3) /
4 = 60 / 4 = 15. |
![]() |
This is a wrong representation--if you evaluate the left and
right children, then do the division, .the result is |
Note that you do not evaluate expressions in the current assignment. However, you need to build the parse tree correctly so that you can evaluate it correctly in a subsequent assignment.
Parser.java and ParserTest.java.
Tuesday, March 30, before midnight. Turn in your program via Blackboard, and zip and include
all necessary packages from previous projects.
(Open all the required projects, close the others, choose
File -> Export -> Jar file -> Next>, and check the projects
to be included.)
It is your responsibility to check and make sure that you have correctly submitted the correct files via Blackboard. Errors in the submission process will not be taken as an excuse for late homework.