CIT 594 Assignment 5: Abstract Syntax Tree
Spring 2011, David Matuszek

# Purposes of this assignment

• To get you familiar with the Tree data structure
• To demonstrate how programs can be represented as Trees
• To get you started using Hash Tables (actually, `HashMap`s)

# General idea of the assignment

Write a program that takes as input a "program" in a simple programming language, and produces an Abstract Syntax Tree (AST) representation of that program. The tree is "abstract" because some information is thrown away--specifically, a lot of punctuation whose only purpose is to help define the structure of the tree. For example,

``````if (x > y)
max = x;
else
max = y;``````

Discarded because implicit in the tree structure:

• parentheses
• semicolons
• the word `else`

# Details

Call your project and your main class `LittleLanguage`.

## The language

Any program, in any programming language, can be described as a tree structure. The primary purpose of most syntax is to help describe that structure. The language you will be implementing--it doesn't yet have a name--is designed to have a very simple syntax, so that it is easy to work with. In fact, the syntax consists of only three elements:

• Words. A word stands for a function or a variable.
• Every function has a fixed number of arguments.
• A variable is the same as a function with zero arguments.
• Numbers. Numbers can be written as integers or reals.
• Blocks. A block is any number of elements (including, possibly, other blocks) enclosed in` [ ] `brackets.

A program consists of a sequence of elements. Whitespace is irrelevant, except it must be used as a separator between adjacent words and numbers (though brackets need not be surrounded by whitespace).

Although we aren't talking about the semantics, or meaning, of our programming language yet, you can assume that the elements are evaluated one after the other, in the order they appear.

Every word and number is represented by a node in an abstract syntax tree. If a word stands for a function, its node will have as many children as there are arguments to the function. For example, the Java` if `statement above can be represented in our little language as

`if-else greater x y set max x set max y `

Here's how to make sense of the above line. Assume that` if-else `is a function that takes 3 arguments; `greater` and `set` are functions that each take 2 arguments; and `x`, `y`, and `max` are variables (functions with 0 arguments). Then:

• `if-else `takes 3 arguments, so
• the first argument must start with `greater`; but `greater` is a function that takes 2 arguments, so
• the first argument to `greater` is `x`, which takes no additional arguments
• the second argument to `greater` is `y`, which takes no additional arguments
• the second argument must start with `set`; but `set` is a function that takes 2 arguments, so
• the first argument to `set` is `max`, which takes no additional arguments
• the second argument to`set `is `x`, which takes no additional arguments
• the third argument must start with` set`; but` set `is a function that takes 2 arguments, so
• the first argument to ` set ` is `max`, which takes no additional arguments
• the second argument to` set `is `y`, which takes no additional arguments

## The Hash Map

You will need to be able to look up, for each word, how many arguments it takes. For this, use a `HashMap<String, Integer>`.

This language has no keywords, only functions with various arities (number of arguments). These could be read in from a file or (preferably) just hard-wired into your program. Here is a tentative initial list:

Functions (words) with one argument: `negate`, `not`, `abs`, `do`, `read`, `print`.

Functions with two arguments: `set`,` if, while`, `plus`, `minus`, `times`, `divide`, `and`, `or`, `less`, `equals`, `greater`.

Functions with three arguments: `if-else`, `define`.

Functions with four or more arguments: None yet, but there may be some later.

A block may have an arbitrary number of children, but this is a special case that is taken care of in the syntax by enclosing the children in brackets. The program as a whole is also treated as a block.

Any unrecognized word (one not in the HashMap) will be considered to have zero arguments.

Due to the simplicity of the grammar, it should be trivial to add or remove functions at any time, just by modifying an entry in the hash map.

## The Tree class

Define a class `Tree` to represent a node in a tree. Each `Tree` object should have a `String value` field and a `private ArrayList<Tree> children` (or, if you prefer, a `LinkedList<Tree> children`) field. You will want to add methods to this class, to manipulate Tree objects; the word `private` is to keep code in other classes from messing around with the tree structure.

Last year in CIT 594 I assigned a complete Tree API, which asked for a general and relatively complete set of operations for working with trees. If you aren't clear about what kind of operations you will need, you might like to look at this assignment. However, don't implement a complete set of Tree methods, just the ones you need. Some of the methods from last year are quite difficult and, for this assignment, totally unnecessary.

Since you will need to display the results of your parse, you will need a method something like `toLongString()`; however, the details may be quite different.

## The AST

In the AST you construct, each Tree node will represent either:

• A function; the `value` field will hold the name of the function (such as `set` or `x`), and the `children` of the node will be the arguments to that function.
• A number; the `value` field will hold the `String` representation of the number, and the node will have no children.
• A block; the `value` field will hold the String `"[]"`, and the node may have any number of children.

I said earlier that "A program consists of a sequence of elements," so it's like a block. Use the String `"[]"` as the value of the root node (but we won't actually put brackets around the entire program), and the elements as the children of the root.

## Input/Output

Provide a GUI. You should be able to type a "program" into the GUI, or read one from a file. You should be able to save a program from the GUI into a file

By clicking a button labeled Parse, you should cause your program to construct an AST from the Little Language program and display it (or display some kind of error message if you can't construct the AST).

Remember that you should not have methods that both do significant computation and also do I/O. This will be important in the next assignment, where we are likely to use a totally different GUI, or none at all.

## Parsing the language

For various reasons, you should keep an unaltered copy of the original input.

To parse the input (that is, construct a Tree from it), I recommend that you first make it more regular by ensuring that brackets are surrounded by whitespace, use the String method `split` to break the input up into an array of individual tokens (words, numbers, and brackets). If you would prefer to work with a list rather than an array (this would let you use a `ListIterator`, which might be convenient), you can use the `Arrays.asList` method on the result..

Since each function takes a certain number of arguments, but each argument may be many tokens long, I recommend writing a` `method that, given a token at a starting point in the input, builds a Tree from that token plus its arguments. Each argument will itself be a Tree (maybe a one-node Tree, maybe much larger), so your method needs to be recursive.

For example, with the string `if-else greater x y set max x set max y`, your method would recognize that `if-else `takes three arguments, so it would call itself three times; the first such call would start with the word `greater`, which takes two arguments, so that instantiation of the method would call itself two times, to get `x` and `y`; and so on.

Blocks, enclosed in brackets, need to be handled as a special case.

# JUnit testing

Yes, absolutely! All the Tree methods, the `parse` method(s), etc. But not the GUI itself.

I have not described in detail how to organize your program, what to name your methods, etc. That means that we can't use our own unit tests to grade your program, we just have to run it (and look at your code, and run your unit tests). For this reason, use the above functions and arities mentioned in The Hash Map section above, so that we can test your program with our own LittleLanguage programs.

Remember all the requirements from Assignment 1:

• Estimate how long this project will take you, and keep track of your actual time.
• Provide JUnit tests, and try to do Test Driven Design.
• Use good programming style.