CIT 594 Assignment 6: Simple functional programming language
Spring 2013, David Matuszek

Purposes of this assignment

Minor change in tokenizer:

As mentioned in class, you need to enhance the provided Tokenizer class to recognize and return quoted strings (4 lines total).

What I didn't say is that you also need to add a new type to TokenType. Please use the name STRING.


Minor changes in grammar:

Changed from
     <value definition> ::= "val" <name> = <term>
to
     <value definition> ::= "val" <name> = <expression>

I had wanted to disallow things like val a = val b = c, but in doing so I accidentally
disallowed things like val a = b + c, which should be legal. The above change allows
both but is easily made.

The same error was made in the definition of factor; I've now changed the first alternative from
     <factor> ::= <name> [ "(" [ <term> { "," <term> } ] ")" ]
to
     <factor> ::= <name> [ "(" [ <expressions> } ] ")" ].
Thanks to the people who pointed out these problems!

General idea of the assignment

Implement a very small programming language. The language needs a name; we'll call it Funl.

You will need a tokenizer. Use this one: tokenizer.zip. You should replace the list of keywords (the language for this assignment has only six). You may make any other necessary modifications to the tokenizer, but try to keep changes to a minimum.

Part 1

Your first task is to write a program to parse programs in Funl, creating abstract syntax trees (ASTs), according to the BNF below. Use your Tree class from Assignment 3. Name your new class Parser.

BNF Tree representation
<program> ::= <function definition> { <function definition> }
Do not construct a tree for a <program>. Instead, construct an AST for each <function definition>, then remove the AST from the stack and save it in a HashMap<String, Tree<Token>>. The key will be the function name, and the value will be the entire function definition (including the name).
<function definition> ::=
    "def" <name> { <parameter> } "=" <expressions> "end"

The constructed AST will have def as its root, the function name as the first child, and $seq nodes as the second and third child. The children of the first $seq will be the parameters, and the children of the second $seq will be the expressions.

Example: def foo x y = x + y end

         def
        / | \
      foo |  \
        $seq $seq
         / \   |
        x   y  +
              / \
             x   y
<parameter> ::= <name>
The tree consists of a single node whose value is the name found in the Funl program.
<expressions> ::= <expression> { "," <expression> }
Parsing <expressions> results in a tree whose root is $seq and whose children are the expressions, in the order of occurrence. This is true even when there is only one expression.
<expression> ::= <value definition>
               | <term> { <add_operator> <term> }

An <expression> is represented the tree for one of the two alternatives.

For the second alternative, if the expression consists of a single term, then use the tree for that term. Otherwise use a tree whose root is the last add operator, the second child is the last term, and the right child is the tree representing the rest of the expression.

Example: a-b+c-d yields the tree

           -
          / \
         +   d
        / \
       -   c
      / \
     a   b

<value definition> ::= "val" <name> = <expression>
A <value definition> is a tree with root val, first child is the (actual) name, and second child is a tree representing the term..
<term> ::= <factor> { <multiply_operator> <factor> }

If the term is just one factor, it is represented by the tree for that factor. Otherwise it is represented by a tree whose root is the last multiply operator, the second child is the last factor, and the right child is the tree representing the rest of the expression.

Example: a*b*c yields the tree

        *
       / \
      *   c
     / \
    a   b

<factor> ::= <name> [ "(" [ <expressions> ] ")" ]
| "if" <expressions> "then" <expressions> "else" <expressions> "end"
| <number>
| "read" <quoted string>
| "(" <expression> ")"

The tree is one of:

  • If the name is followed by a "(", then the resultant tree should consist of the root $call with the actual name as the first child, and a $seq of terms as the second child. Otherwise the tree should consist of just a single node containing the name.
  • The root if with three children, each of which is a $seq.
  • A single node containing the number.
  • The root read with the quoted string as its single child.
  • A tree representing the expression (the parentheses are discarded).
<add_operator> ::= "+" | "-"
A single node whose value is "+" or "-".
<multiply_operator> ::= "*" | "/"
A single node whose value is "*" or "/".

Some explanations regarding the AST are in order.

Part 2

Write a REPL and interpreter for Funl. Details are here.

If you want to get an early start on the REPL, here's what it is supposed to do: Repeatedly read an <expression> from the user, parse it, call the interpreter to compute the value of the expression, print the value. The REPL should do very basic error handling (i.e. not crash), and there should be some way to quit.

Due date

6am Thursday, March 21 Tuesday, March 26, via Canvas. Zip your entire Java Project, including Tree and Tokenizer and related classes, along with your Parser and Interpreter classes.