| CIT 594 Assignment
8: Parser for "Bugs" Language Spring 2008, David Matuszek |
In this assignment you will continue to build on the previous assignments. Since there will be some code changes, you should create a new project and copy the relevant Java files into it--don't make changes to your earlier projects (except as noted below).
You will need the Token, Tree, SyntaxException,
and
Recognizer classes and their corresponding
JUnit test classes.
Add a method
to yourboolean keyword(String expectedKeyword) Recognizerclass, similar to the. It should accept as keywords all the quoted English words specified in the grammar;boolean name(String expectedName) nameshould be modified to reject these same words. Test these methods. I recommend (but don't require) that you add a newToken.Type KEYWORD.
Your assignment is to write a parser for programs in the "Bugs" language. A parser is a program that, given a stream of tokens, creates a tree that represents the structure of the program represented by that stream of tokens. Such a tree is called a parse tree. The meaning of the program, if any, is outside the scope of this assignment.
We are dealing with fairly large data structures, and it is important to be consistent. Otherwise, things get too confusing. Here is the way I have it designed:
value field in a Token always holds a String.value field in a Tree node
always holds a Token. Thus, our Tree declarations are of the formTree<Token> tree
= new Tree<Token>();public Stack<Tree<Token>> stack
= new Stack<Tree<Token>>(); You will find it very helpful to remember this design and keep it in mind as you program.
Parser class:First, make a copy of your Recognizer class, and change
its name to Parser. You will be transforming this class from
a recognizer, which simply returns true or false,
into a parser, which creates a tree corresponding to its input. We will
use this parse tree later in this course.
One way of doing this transformation is to change each Recognizer
method to return a Tree instead of a boolean. Don't do this.
There is a much easier way that allows you to make smaller changes and do unit
testing as you go. That easier way is to keep a "global" Stack
of Trees as you go. As you step through the parts of a definition, those parts
go onto the stack; when you reach the end, you remove those parts from the
stack, combine them into a tree for the thing being defined, and put the tree
onto the stack.
For example, consider the grammar rule
| In the Recognizer: | In the Parser: |
|---|---|
public boolean expression() {
|
public boolean expression() {
|
|
Notes:
|
|
The general idea is that as you recognize things in the input, you build Trees
to represent them, and push these Trees onto a stack. As you go along, you pop
Trees from the stack, combine them into a larger Tree, and put this new Tree
onto the stack. When you are done, you should have exactly one Tree on the stack,
representing the complete <program>.
For example, suppose you find "x = 5\n". You will recognize
the x as a <variable>, make it into a (single
node) Tree, and put it on the stack. Similarly for the "="
symbol and the "5" <expression>. Then you will
pop these three Trees from the stack, combine them into a single tree (with
the "=" Tree node as the root), and put the result back
on the stack.
First, you should realize that not everything goes into the parse tree.
Some tokens, such parentheses and ends-of-lines, are there to help define the
structure or as an aid to the recognizer. Some nonterminals, such as <expression>
and <term>, are in the grammar mostly to define the structure
that the resulting tree should have. In general, anything that provides structure
is represented in the structure of the tree; anything that provides content
is represented as nodes with values.
Second, there may be things in the parse tree that are not explicit in the original Bugs program. For example, a block occurs in many places in the grammar, but there is no "block" keyword. As another example, when you create a method node, one of its children is a parameters node to hold the list of formal parameters.
In the following table I will attempt to show what goes into the parse tree for each grammar rule or rule alternative. Where a rule has more than one alternative, the first column contains only the alternative that is used in the example.
| Icon | Intended Meaning |
|---|---|
| A rectangle indicates a specific type of Tree node, representing the action to be taken when the node is evaluated. In this example, a "block" Tree node, which indicates that its children (commands) are to executed in order. Many, but not all, nodes like this are named after keywords. | |
The "cloud" indicates that a Tree of the appropriate
type belongs here. In this example, the "expression" means
that a Tree built from an Note: For space reasons, |
|
A rounded rectangle indicates that a single terminal
of the indicated type belongs here. In this example, "number" might be
"5"
or "83", but cannot be a name or a more complicated
expression. |
Note #1: Some nodes, such as block and var,
have zero or more children, all of the same kind. Other nodes, such as program and Bug,
need to allow for different kinds of children, but may have more than one of
some kinds (for example, a program has an Allbugs node,
but can contain many Bug nodes). These more complicated nodes have
an intermediate list child.
Note #2: "Optional" elements, such as the Allbugs part
of a program, may be omitted from the program text, but must
be present in the tree if they are needed in order to keep the positions of
other children correct. For example, the second child of a program is
always a list. However, a node that represents an omitted text
element (Allbugs in this example) need not have children.
Another
way to say this is: Any node in the tree (other than variable nodes, such
as block and list) must have either the correct number of children, or none
at all.
| Grammar Rule | Tree | Comments |
|---|---|---|
<program> ::=
[ <allbugs code> ] |
![]() |
A program node has exactly two children: An Allbugs node (which must be present, but doesn't need to contain any code),
and a list node with one or more Bug children. |
<allbugs code> ::=
"Allbugs" "{" <eol> |
![]() |
An Allbugs node has exactly two children:
A list node with zero or more var children,
and a list node
with one or more function children. |
<bug definition> ::=
"Bug" <name> "{" <eol> |
![]() |
A Bug node has exactly five children: a name, an
initially tree, a list with var children, a block, and a list with
function children. |
<var declaration> ::=
"var" <NAME> { "," <NAME> } <eol> |
![]() |
A var node has zero or more children, each of which
is a variable name. |
<initialization block> ::=
"initially" <block> |
![]() |
An <initialization block> consists of an initially node with one child, a block. |
<block> ::=
"{" <eol>
{ <command> }
"}" <eol> |
A block node has zero or more commands as
children. |
|
<move action> ::=
"move" <expr> <eol>
<turn action> ::=
"turn" <expr> <eol>
<turnto action> ::=
"turnto" <expr> <eol>
<return statement> ::=
"return" <expr> <eol> |
![]() |
A <move action> requires exactly
one <expression>.
<turn action>, <turnto action> and
<return statement> all have the same structure. |
<moveto action> ::= "moveto" <expr> "," <expr> <eol> |
![]() |
A <moveto action> requires exactly
two <expression>s. |
<line action> ::=
"line" <expr> "," <expr> ","
<expr> "," <expr> <eol> |
![]() |
A <line action> requires exactly
four <expression>s. |
<assignment statement> ::=
<variable> "=" <expression> <eol> |
![]() |
|
<loop statement> ::=
"loop" <block> |
![]() |
|
<exit if statement> ::=
"exit" "if" <expression> <eol> |
![]() |
|
<switch statement> ::=
"switch" "{" <eol>
{ "case" <expression> <eol>
{ <command> } }
"}" <eol> |
![]() |
|
<color statement> ::=
"color" <KEYWORD> <eol> |
![]() |
|
<function definition> ::=
"define" <NAME>
[ "using" <variable>
{ "," <variable> } ] <block> |
![]() |
|
<function call> ::=
<NAME> <parameter list>
<do statement> ::=
"do" <variable> [ <parameter list> ] <eol>
|
![]() |
A <function call> occurs in an
expression, while a <do statement> occurs as an independent
statement, but they are represented by the same kind of Tree. |
<color statement> ::=
"color" <KEYWORD> <eol> |
![]() |
The grammar I provided does not distinguish between color names and other keywords. We can do this later. |
<expression>, <arithmetic
expression>, <term>, <factor> |
![]() |
In any sort of expression, the top node is one of :
Parentheses are represented implicitly in the shape of the tree, not explicitly. |
An operation cannot be performed until its operands are known. When an operation is represented by a node in a binary tree, that operation cannot be performed until the children of that node have been evaluated. (This requires a postorder tree traversal.) Therefore, the structure of the binary tree imposes restrictions on the order in which operations are performed.
For example, consider the expression 20 * 3 / 4
![]() |
This is the correct representation--if you evaluate the left and
right children, then do the division, the result is . |
![]() |
This is a wrong representation--if you evaluate the left and right
children, then do the division, .the result is |
Note that you do not evaluate expressions in the current assignment. However, you need to build the parse tree correctly so that you can evaluate it correctly in a subsequent assignment.
Here are files Parser.java and ParserTest.java.
You can probably just copy
ParserTest.java, but (as noted above), you should make a copy
of your Recognizer.java and rename it Parser.java.
Then you can replace similarly named methods with methods from my starter code.
Thursday, March 22, Saturday, March 24, before midnight. Turn in your
program via Blackboard,
and follow these conventions:
package declarations in your Java files.