CIT 594 Assignment 10: Automata Spring 2006, David Matuszek

# Purposes:

• To give you some experience using graphs.
• To give you some experience with packages.
• To give you some experience with backtracking algorithms.
• To give you more experience with program design.

# General idea:

In this assignment you will implement a very simple type of "computer" with a complex name, a nondeterministic finite-state automaton (NFA). NFAs are graph structures.

Name your main class `NFA`. Use your very own Graph API (from the previous assignment) to read in the NFA graph and "walk" it to process input. Do not make any changes to your Graph API unless absolutely necessary (to fix bugs, or to correct serious inadequacies). Do not use the `Plex` class in your NFA classes; they should not "know" that this is how your graphs are implemented.

An automaton can be represented as a graph, with the nodes representing "states" of the automaton, and the edges representing "transitions" between states. Here are the usual conventions for drawing automata:   The (one) start state is indicated by an arrow pointing to it. This arrow does not indicate an edge in the graph! Transitions are represented by labeled edges. The label on the edge indicates which input will cause the transition to occur. Final states are represented by double circles. (Think of these as "goal nodes.")

Here's how a deterministic finite-state automaton (DFA) works. For each new input sequence,

1. Set the "current state" to the start state.
2. For each symbol in the input,
1. If there is an outpointing edge from the current state with that symbol as a label, follow the edge to get the new current state.
2. If there is no outpointing edge from the current state with that symbol as a label, terminate and reject the input.
3. When all symbols have been read, accept the input if the current state is a final state, and reject the input if it is not.

In a DFA, at each node there will be at most one outpointing edge with any given label. That means that the next state is completely determined by the input.

In an NFA,

• There may be more than one outpointing edge with any given label. That means you have to choose which edge to follow;
• Some edges may be labeled with `EMPTY`, meaning that you may choose to follow the edge, but you aren't required to. This does not use up an input symbol; and
• The input is accepted if any sequence of choices leads to a final state.

# Details of this assignment:

You will need to read in a description of an NFA and construct the corresponding graph. Here is an example of the input:

 ```STATE q0 q1 q2 SYMBOL + - 9 TRANSITION q0 + q1 q0 - q1 q0 EMPTY q1 q1 9 q2 q2 9 q2 START q0 FINAL q2 INPUT 9 9 9 INPUT + 9 + 9``` Here is a more precise definition of the required input:

• `STATE`, `SYMBOL`, `TRANSITION`, `START`, `FINAL`, and `INPUT` denote sections of the input. They must all occur, and must occur in this order. These words cannot be used as the names of nodes or the labels on edges.
• Each word following `STATE` is the name of a node on the graph.
• Each word following `SYMBOL` is an allowable edge label. Edge labels may be used on as many edges as desired.
• Words following `TRANSITION` must be in groups of three, and each group describes a directed edge on the graph, in the order: Name of source node, edge label, name of destination node.
• In addition, the edge label `EMPTY` may be used to denote an "empty transition."
• Exactly one node name must follow `START`; it is the name of the start state (node).
• Zero or more node names may follow `FINAL`; these are the final states. (Note: It is possible for the start state to also be a final state.)
• `INPUT` denotes a series of zero or more `SYMBOL`s to be given to the automaton to recognize.
• The word `INPUT` may occur more than once; each occurrence provides a separate, independent sequence of symbols to be given to the automaton. (In other words, finish processing the old input, and start again at the start state with the new input.)
• Capitalization is significant.
• Whitespace is not significant, except to separate words.
• "Words" may be composed of any non-whitespace characters.
• Input may consist of more than one NFA. Each will be started by the word `STATE`. When you see this word, throw away the old graph and reinitialize everything.

I'm trying to be very precise about the form of the input file, because your input method will need to be able to read our test cases. I will also supply some sample input data. We will test with correct input, so checking the input for errors is not necessary.

Use a `JFileChooser` to choose an input file to execute. Allow the user to do this as many times as he/she likes without having to restart your program.

Use Java 5's `Scanner` class to read each file. `Scanner` is a fairly powerful Sun-supplied tokenizer, but for this assignment the only "tokens" you need are "words" (sequences of printing characters) separated by whitespace. You can construct a `Scanner` to read from a `String` (as in the following example) or from a `File` (which will be more useful in this assignment).

Program Output
```import java.util.Scanner;

public class SimpleScanner {
public static void main(String[] args) {
String test = " This  is an  \t input\nString  ";
Scanner s = new Scanner(test);
while (s.hasNext()) {
System.out.println("\"" + s.next() + "\"");
}
}
}```
"This"
"is"
"an"
"input"
"String"

After reading in the graph, print it out.

For each input, test whether the input is accepted by the NFA, and print the result (whether accepted or rejected). Do the test with a recursive backtracking algorithm.

If the input is accepted, print a trace of the successful path taken. A trace consists of a sequence of node and edge labels (or equivalently, a sequence of node labels alternating with inputs). For example: Input: `+ 9 9`   Trace: `q0 + q1 9 q2 9 q2`

If you start printing the trace at the lowest level of a successful recursion, and print at each level as you return, the trace will come out backwards. The simplest solution is this: Instead of printing as you return, put the result in a global stack, and print the stack after you exit from the recursion.

As a reminder, here is how backtracking works:

```boolean explore(N) {
If N is a goal node, return “success”
If N is a leaf node, return “failure”
For each child C of N,
If explore(C) return “success”
Return “failure”
}```

To simplify the assignment, you can assume that there will be no cycles in the graph that contain only empty transitions. In other words, you don't have to worry about getting into an infinite loop; the only way you can get from a node back to itself is by consuming at least one input symbol.

# Package structure:

In Eclipse, create a new package named `graph`. Drag all your Java files (both `Plex` and graph) into this new package. (This will add a `package` directive as the first line of each file). Next, change every method in the `Plex` class from `public` to `protected` (the only thing left `public` should be the `value` field.) Put your `NFA` classes in a separate package named `nfa`, and` import graph.*` into each NFA class as needed.

# Due date:

Monday, April 24, before midnight. No extensions except for serious reasons. Turn in the complete project (including both `nfa` and `graph` packages) in the usual way.