CIT 594 Assignment 5: BNF Tokenizer and Parser
Spring 2013, David Matuszek


General idea

There are four parts to this assignment:

  1. Write a tokenizer for BNF, using a state machine.
  2. Implement a data structure for BNF (the design is provided for you).
  3. Write, on a separate data file, a BNF for a simple programming language.
  4. Write a program to read in and store the BNF, then generate random programs from it.

Part 1

enum TokenType

Define a TokenType enum with the values NONTERMINAL, TERMINAL, METASYMBOL, SEQUENCE, OR, OPTION, and ANYNUM.

class BnfTokenizer implements Iterator<Token>

Write a class BnfTokenizer with the following constructor and methods



class Token

Define a Token class, with the following constructor and methods

Note that all the "real work" is done by the BnfTokenizer class. TokenType is just an enum and needs no methods, while Token just has getter methods.

class BnfTokenizerTest

Thoroughly test (with JUnit 4) all your tokenizer methods.

Part 2

class BNF

An instance of the BNF class represents a complete BNF grammar, typically consisting of many rules. Each rule is represented as a tree.

Your data representation should be a HashMap<Token, Tree<Token>>. The keys are nonterminals; the values are the definitions of those nonterminals, represented as trees. In your tree representation, use the token types NONTERMINAL, TERMINAL, SEQUENCE, OR, OPTION, and ANYNUM, but not METASYMBOL.

Required methods of the BNF class

public void read(Reader reader)
Uses the BnfTokenizer to read in any number of grammar rules into this grammar.

Note: is an abstract class. Concrete subclasses include BufferedReader, which is good for reading from a file, and StringReader, which is good for reading from a String. You need the former in your ProgramGenerator class, and you should use the latter for unit testing.
public void write(Writer writer)
Writes out the entire grammar, in a way that it could be read in again by read. Note: See Do at least a small amount of unit testing of this method; the StringWriter class makes this possible. You don't have to do a lot, just enough to show that you can.
public Tree<Token> lookUp(String nonterminal)
Look up the nonterminal in the HashMap and return the Tree. This method should check whether the argument is enclosed in angle brackets, and add them if they are not already present; so lookUp("<while loop>") and lookUp("while loop") should both work.

A Program Grammar

Write a grammar for a really, really simple programming language. (You will never need to implement this language; just define the syntax for it.)

Keep your language really simple, but do try to make it something that it would be possible to use to write programs. Use all of the features of EBNF: Sequencing, alternatives, optional elements, and repeated elements.

Use <program> as your "top-level" nonterminal.

class ProgramGenerator

Write a main class ProgramGenerator that does the following:

ProgramGenerator should have (at least) the following methods:

JUnit and Javadoc

JUnit test all your methods except those that do I/O. For the BNF class, also test write.

Document all non-private methods. JUnit test methods don't have to be documented unless they do something non-obvious. Generate the Javadoc and turn it in as part of your assignment submission; but look it over before you turn it in, because you are likely to discover some problems that way

Due date

Zip up and turn in Part 1, including all test files and the generated Javadoc, via Canvas, before 6 AM Monday, February 18. Part 2 will be due a week later.