CIT 594 BNF Example
Spring 2013, David Matuszek

The BNF assignment is correct as written. It's somewhat different than my usual assignments (hence part of my confusion), in that the parsing is done at a different stage.

I'll try to clarify things with an example.

  1. Write a tokenizer for BNF. You've done this part.
  2. Write a simple grammar for a simple programming language. I'll use BNF to define a language in which a program consists of some number of read statements, followed by some number of write statements. The read statement reads in one var at a time; the write statement can write out any number (but at least one) of vars.
    <program> ::= <reads> <writes>.
    <reads> ::= { read <var> }.
    <writes> ::= { write <var> {, <var>}}.
    <var> ::= a | b | c | d | e.
  3. There are two ways that a BNF grammar can be used:
    1. To recognize and parse programs in the language. You are supposed to write a parser for the BNF metalanguage. You are not (and this is where I got tangled up) suppose to write a parser for your simple programming language.
    2. To generate programs in the language. In the following steps, you will be generating programs in your simple programming language.
  4. Using your BNF tokenizer, write a recursive descent parser to read in and parse the BNF rules of the grammar for your simple programming language. Create a Map where each nonterminal is a key and the corresponding value is a tree representing its definition.
    Key value

    alternate value*
    (see explanation below)

    <program>
     
    <reads>
    <writes>
    <var>
     

    * According to the assignment, This leaves it unclear whether an ANYNUM node may have many children (in an implicit sequence), or only one. I prefer the first (fewer nodes overall), but the second is also plausible, so I will accept either.
  5. Starting with a list containing just the string "<program>", repeatedly expand each nonterminal in the list (that is, replace it by its definition) until there are no nonterminals left. For example,
    start with <program>
    expand <program> <reads><writes>
    expand <reads>

    read <var> read <var> read <var> <writes>

    expand some <var> read <var> read d read <var> <writes>
    expand some <var> read e read d read <var> <writes>
    expand <writes> read e read d read <var> write <var>,<var>
    expand some <var> read e read d read <var> write a, <var>
    expand some <var> read e read d read <var> write a, d
    expand some <var> read e read d read e write a, d
    quit, because no more nonterminals Write out the generated program.

    Things to note:
    1. I expanded nonterminals in a somewhat random order, just to show that the order doesn't matter. You will probably expand the first (or the last) nonterminal each time.
    2. It's easy to work with a list of terminals and nonterminals, then turn it into a string at the last minute. It'sharder to do all the work in a string.
    3. The program writes out a variable (a) that it has not read in. This cannot be corrected by BNF alone! BNF is a context-free grammar, which is another way of saying that you cannot use information outside the nonterminal to decide how to expand the nonterminal.