CIT 594 Assignment 3: Tree Generation
Spring 2010, David Matuszek

Purposes:

General idea

Write a text file containing BNF grammar for English sentences. Here is an example you can start with, but you should either write your own, or "improve" this one. In either case, replace the vocabulary items (the "terminals") with your own vocabulary items.

Read in your BNF grammar and store it in a Grammar object, as described below.

Specifics

Reading and storing the grammar

A definition consists of two parts: The definiendum, which is the term being defined, and the definiens, which is the part that defines it. (I think these are ugly terms, but I didn't invent them.) In BNF we use the ::= symbol to separate the two parts, for example,

definiendum ::= definiens
<assignment_statement> ::= <variable> = <expression>

To represent a simple definition like this in Java, we might use a String for the definiendum and a List of Strings for the definiens:

definiendum (=key) definiens (= value)
"<assignment_statement>" ["<variable>", "=", "<expression>"]

(By the way, this isn't legal syntax for a list in Java; I'm just trying to explain the idea.) In order to be able to look up the definition (definiens) of a term (definiendum), we can put the whole thing in a Map (for example, a HashMap), using the definiendum as the key and the definiens as the value. OK, I really dislike these words, so I'm going to start using term=definiendum and definition=definiens.

The problem with this is that a term may have several definitions (definientia--told you it was ugly!). So a List of Strings isn't enough: We need to use a List of Lists of Strings. For example:

term (=key) definition (= value)
"<identifier>" [ ["<letter>"],
  ["<identifier>", "<letter>"],
  ["<identifier>", "<digit>"] ]

In BNF, the above could be written as
     <identifier> ::= <letter> | <identifier> <letter> | <identifier> <digit>
or as
     <identifier> ::= <identifier> <letter>
     <identifier> ::= <letter>
     <identifier> ::= <identifier> <digit>

or in several other ways. The various possible definitions (definientia) could be grouped two in one line and one in the other, and they could be in any order (order doesn't matter).

Your first task is to read in and store the grammar.

Using the grammar

In a grammar for English, the "top level" term might be <sentence>; for a programming language, it might be <program>. The grammar itself doesn't tell you what top-level term to use; that's up to you.

Creating your own grammar

To give you some practice in writing BNF, create a data set (a set of BNF rules) to describe a very very simple programming language. You define the syntax of the language, but it should have (1) some kind of print statement, (2) some kind of if statement, (3) some kind of while loop, and (4) some kind of assignment statement. To keep things very simple, no expression may contain parentheses or more than one operator--that is, b + 5 is as complicated as an expression should be. Also, don't depend on indentation (as in Python), because that's tricky to do in BNF. Use <start> ::= <program> as the "top level" definition.

Program structure

In the previous section I described how to do things, but not where to do them. This section describes the structure that I want your program to have.

The Grammar class

The following should be in your Grammar class:

public Grammar()
This is the constructor. It creates a new, empty grammar. The one thing this constructor needs to do is to create a new, empty HashMap to hold definitions. The keys will be Strings, while the values will be ArrayLists of ArrayLists of Strings. Use generics to enforce this.
public void addRule(String ruleText) throws IllegalArgumentException
This method adds rules to the grammar. The argument is a String containing the nonterminal being defined, the symbol ::=, and one or more definitions separated by the | symbol, meaning "or."

This is where you will need to use the StringTokenizer, to break up the input argument into tokens.

If the rule has more than one definition, treat it as more than one rule. For example, if ruleText is "<sign> ::= + | -", you should get the same result as if you had called addRule with "<sign> ::= +", and then called it again with "<sign> ::= -".

For each rule: If the key (the nonterminal being defined) is not in the HashMap, add it along with its definition. But if it is already in the HashMap, then you should add it to the list of definitions already present
public static boolean isTerminal(String token)
Returns true if the given token is a terminal, false if it is a nonterminal. (A token is a nonterminal if it begins with a '<', ends with a '>', and is at least 3 characters long.)
public ArrayList<ArrayList<String>> getDefinitions(String nonterminal)
Given a nonterminal, returns its definitions. (If given a terminal, returns null.)
public static Grammar read()
Reads in and returns a Grammar object. (You can use LineReader and addRule in this method.)
public void print()
Prints this Grammar neatly, on multiple lines.

Testing

The Generator class

This is the "main" class for this assignment; it has a public static main(String[] args) method, and it uses your Tree class and the above Grammar class. It should contain at least the following:

No constructor
You can just use the default constructor.
public static void main(String[] args)
I recommend "escaping static" by having this method create a Generator and calling a non-static method of it; for example, you might write a void main() method, and have this method call new Generator().main().

Whether you do it here or in a separate method (like main()), the program should:
  1. Ask the user for a file containing a grammar, and read it in (creating a Grammar object).
  2. Print the grammar.
  3. Call generateRandomTree to create a random tree using this grammar, with <start> at the root.
    • Important note: Although the result should be random, that doesn't mean that, when a nonterminal has several definitions, each has to be equally likely to be chosen. If all definitions are equally likely, you might get extremely large (or even infinite!) trees. It's best if you randomly choose "shorter" definitions more often than "longer" ones.
  4. Print the tree, using the toLongString method of your Tree class.
  5. Print the tree as a "sentence," using the printAsSentence method of this class.
  6. Ask the user if she wants to generate another tree. If so, continue from step 3.
  7. Ask the user if she wants to read in another grammar. If so, continue from step 1.
public Tree<String> generateRandomTree(Grammar grammar, String term)
Expands the given term into a Tree of Strings, where the leaves of the tree are all terminals. If the term is itself a terminal, returns a Tree consisting of a single node, with term as its value. This method is to be written as a recursive method.
public void printAsSentence(Tree<String> tree)
Traverses the tree, printing the value each leaf. The first leaf encountered should be printed with an initial capital letter; put a space between the values; and end with a period. This method is to be written as a recursive method.

Notes

All non-private elements should have a well-written Javadoc comment. Remember that a Javadoc comment is supposed to tell the user everything s/he needs to know in order to use that element.

Generate the Javadoc and look it over before you turn in your program.

Provided files

LineReader.java and sample-grammar.txt.

Due date

Zip up and turn in your complete project, including all test files and the generated Javadoc, via Blackboard, before midnight, Friday, January February 12.