CIT 594 Assignment 5: Abstract Syntax Tree
Spring 2011, David Matuszek

Purposes of this assignment

General idea of the assignment

Write a program that takes as input a "program" in a simple programming language, and produces an Abstract Syntax Tree (AST) representation of that program. The tree is "abstract" because some information is thrown away--specifically, a lot of punctuation whose only purpose is to help define the structure of the tree. For example,

Java AST Comments
if (x > y)
    max = x;
else
    max = y;
AST for IF statement

Discarded because implicit in the tree structure:

  • parentheses
  • semicolons
  • the word else

Details

Call your project and your main class LittleLanguage.

The language

Any program, in any programming language, can be described as a tree structure. The primary purpose of most syntax is to help describe that structure. The language you will be implementing--it doesn't yet have a name--is designed to have a very simple syntax, so that it is easy to work with. In fact, the syntax consists of only three elements:

A program consists of a sequence of elements. Whitespace is irrelevant, except it must be used as a separator between adjacent words and numbers (though brackets need not be surrounded by whitespace).

Although we aren't talking about the semantics, or meaning, of our programming language yet, you can assume that the elements are evaluated one after the other, in the order they appear.

Every word and number is represented by a node in an abstract syntax tree. If a word stands for a function, its node will have as many children as there are arguments to the function. For example, the Java if statement above can be represented in our little language as

if-else greater x y set max x set max y

Here's how to make sense of the above line. Assume that if-else is a function that takes 3 arguments; greater and set are functions that each take 2 arguments; and x, y, and max are variables (functions with 0 arguments). Then:

The Hash Map

You will need to be able to look up, for each word, how many arguments it takes. For this, use a HashMap<String, Integer>.

This language has no keywords, only functions with various arities (number of arguments). These could be read in from a file or (preferably) just hard-wired into your program. Here is a tentative initial list:

Functions (words) with one argument: negate, not, abs, do, read, print.

Functions with two arguments: set, if, while, plus, minus, times, divide, and, or, less, equals, greater.

Functions with three arguments: if-else, define.

Functions with four or more arguments: None yet, but there may be some later.

A block may have an arbitrary number of children, but this is a special case that is taken care of in the syntax by enclosing the children in brackets. The program as a whole is also treated as a block.

Any unrecognized word (one not in the HashMap) will be considered to have zero arguments.

Due to the simplicity of the grammar, it should be trivial to add or remove functions at any time, just by modifying an entry in the hash map.

The Tree class

Define a class Tree to represent a node in a tree. Each Tree object should have a String value field and a private ArrayList<Tree> children (or, if you prefer, a LinkedList<Tree> children) field. You will want to add methods to this class, to manipulate Tree objects; the word private is to keep code in other classes from messing around with the tree structure.

Last year in CIT 594 I assigned a complete Tree API, which asked for a general and relatively complete set of operations for working with trees. If you aren't clear about what kind of operations you will need, you might like to look at this assignment. However, don't implement a complete set of Tree methods, just the ones you need. Some of the methods from last year are quite difficult and, for this assignment, totally unnecessary.

Since you will need to display the results of your parse, you will need a method something like toLongString(); however, the details may be quite different.

The AST

In the AST you construct, each Tree node will represent either:

I said earlier that "A program consists of a sequence of elements," so it's like a block. Use the String "[]" as the value of the root node (but we won't actually put brackets around the entire program), and the elements as the children of the root.

Input/Output

Provide a GUI. You should be able to type a "program" into the GUI, or read one from a file. You should be able to save a program from the GUI into a file

By clicking a button labeled Parse, you should cause your program to construct an AST from the Little Language program and display it (or display some kind of error message if you can't construct the AST).

Remember that you should not have methods that both do significant computation and also do I/O. This will be important in the next assignment, where we are likely to use a totally different GUI, or none at all.

Parsing the language

For various reasons, you should keep an unaltered copy of the original input.

To parse the input (that is, construct a Tree from it), I recommend that you first make it more regular by ensuring that brackets are surrounded by whitespace, use the String method split to break the input up into an array of individual tokens (words, numbers, and brackets). If you would prefer to work with a list rather than an array (this would let you use a ListIterator, which might be convenient), you can use the Arrays.asList method on the result..

Since each function takes a certain number of arguments, but each argument may be many tokens long, I recommend writing a method that, given a token at a starting point in the input, builds a Tree from that token plus its arguments. Each argument will itself be a Tree (maybe a one-node Tree, maybe much larger), so your method needs to be recursive.

For example, with the string if-else greater x y set max x set max y, your method would recognize that if-else takes three arguments, so it would call itself three times; the first such call would start with the word greater, which takes two arguments, so that instantiation of the method would call itself two times, to get x and y; and so on.

Blocks, enclosed in brackets, need to be handled as a special case.

JUnit testing

Yes, absolutely! All the Tree methods, the parse method(s), etc. But not the GUI itself.

Grading

I have not described in detail how to organize your program, what to name your methods, etc. That means that we can't use our own unit tests to grade your program, we just have to run it (and look at your code, and run your unit tests). For this reason, use the above functions and arities mentioned in The Hash Map section above, so that we can test your program with our own LittleLanguage programs.

Remember all the requirements from Assignment 1:

Due date

Turn your assignment in to Blackboard by 6AM Friday, February 18.