Copyright © 1997 Xerox Corporation. All rights reserved.
Xerox Finite-State Tool
Lauri Karttunen, Tamás Gaál, and André Kempe
Version 5.9.0
Abstract: Xerox finite-state tool is a general-purpose
utility for computing with finite-state networks. It enables the user to
create simple automata and transducers from text and binary files, regular
expressions and other networks by a variety of operations. The user can
display, examine and modify the structure and the content of the networks.
The result can be saved as text or binary files.
Introduction
XFST is a general-purpose Unix application for computing with finite-state
networks, based on long-term Xerox research initiated by Ronald M. Kaplan
and Martin Kay at the Xerox Palo Alto Research Center in the early 1980s.
It is a successor to two earlier interfaces of the same type: IFSM and
FSC. The IFSM interface was created at PARC by Lauri Karttunen and Todd
Yampol around 1990-92. The FSC tool was developed at XRCE in 1994-95 by
Pasi Tapanainen. XFST combines the best features of its two predecessors.
It includes many extensions to the regular expresson calculus introduced
at XRCE in the last few years.
XFST can read finite-state networks from binary files and compile them
from regular expressions and text files. The networks can be simple networks
or finite-state transducers. They can be combined by means of a variety
of operations, such as union and composition. The resulting networks can
be saved into a binary or a text file. The user may apply a network to
strings to determine whether the string is accepted by the network or to
transform it to another string if the network is a transducer. XFST provides
many ways to get information about a network and to inspect and modify
its structure.
We introduce XFST with a brief tour that shows how the application is
launched in Unix and some of the things that it can do. In the later sections
we discuss the command set and other related issues in more depth. For
help with regular expressions and other issues concerning the structure
and application of finite-state networks, please consult our documentation
on these special topics:
Running XFST
XFST is launched with the command 'xfst'. The launch command can be modified
by a number of optional flags. For example, 'xfst -f scriptfile' executes
the commands in scriptfile and exits upon completion. If no additional
modifiers are given, the 'xfst' command starts the application in an interactive
loop waiting for commands from the user:
> xfst
Copyright © Xerox Corporation 1997
Xerox Finite-State Tool, version 5.6.6
Type "help" to list all commands available or "help help" for further
help.
xfst[0]:
The xfst[0]: prompt indicates that the xfst application is waiting
for a command. The number 0 indicates that the network stack is empty.
The great majority of XFST commands are about adding networks to the
stack, replacing some or all of the stack by the result of some operation,
and saving the stack into a file. Another set of commands is concerned
with the network currently on the top of the stack, that is, the network
that was most recently added to the stack. We give a few simple examples
of both types of commands.
We present two examples. In the first example, the user creates a network
with the help of XFST and saves the result into a file. In the second example,
we retrieve the network from the file to use in another operation. We then
show how a sequence of commands can be run by a script and aliased to a
single command.
Making and saving a network
A network can be added to the stack by loading a previously compiled network
from a binary file or by compiling a new one from some text source. In
either case, the network becomes the topmost one on the stack. In this
example, we compile a network from a regular expression using the command
'read regex'. We type
xfst[0]: read regex [%0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9] ;
This expression denotes the language that consists of the ten decimal digits.
Because 0 is a special symbol (epsilon) in a regular expression, it is
necessary to prefix it here with %, the escape character, to have it interpreted
as a digit. The semicolon at the end of the line closes the regular expression.
When the command is terminated with a carriage return, XFST responds:
2 states, 10 arcs, 10 words.
xfst[1]:
showing that the network representing this ten-word language consists of
2 states and 10 arcs. The new prompt, xfst[1]: shows that we now
have one network on the stack. The command 'print net' displays the structure
of the network on the screen:
xfst[1]: print net
Sigma:
0 1 2 3 4 5 6 7 8 9
Size: 10
Net: ["0" | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9] ;
Flags: deterministic, pruned, minimized, epsilon_free
Arity: 1
s0: 0 -> fs1, 1 -> fs1, 2 -> fs1, 3 -> fs1, 4 -> fs1, 5 -> fs1,
6 -> fs1, 7 -> fs1, 8 -> fs1, 9 -> fs1.
fs1: (no arcs)
xfst[1]:
Here the 'print net' command displays the states of the network: s0
(a non-final state), fs1 (a final state); and the labeled arcs
leading from s0 to fs1. In addition, we see the symbol
alphabet of the network (Sigma), the regular expression it was compiled
from, and some characteristics of the network (Flags, Arity).
It is often convenient to give a network a name that can be used in
a regular expression to refer to it. The command for that assignment is
'define':
xfst[1]: define Digit
xfst[0]:
The 'define' command requires at least one argument: the symbol that is
being defined, here 'Digit'. If no further specification is given, the
network on the top of the stack becomes the value of the defined symbol
and is removed from the stack.
Another way to achieve exactly the same result is to include in the
'define' command a regular expression that denotes the desired language
or relation. In this case, the single command
xfst[0]: define Digit [%0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9] ;
2 states, 10 arcs, 10 words.
xfst[0]:
would have exactly the same effect as the sequence of commands shown above.
Here the stack remains empty. Note the closing semicolon that marks the
end of the regular expression.
Once defined, the name 'Digit' can be used in regular expressions to
represent the language in question. To give a simple example, let us construct
a transducer that converts US numerals to the European format. In US numerals
the comma is used as a separator, the period marks the beginning of the
decimal part. In Europe the convention is the opposite. Thus "1,000.00"
in the US corresponds to "1.000,00" in Europe. A transducer that does this
conversion can be defined as follows, using the defined 'Digit' symbol:
xfst[0]: read regex %. -> %, , %, -> %. || Digit _ Digit ;
4 states, 41 arcs, Circular.
xfst[1]:
This transducer represents the parallel replacement of "." by "," and ","
by "." between two digits.
To verify that the transducer does what it is supposed to do we can
use the 'apply' command. Because transducers are bidirectional, we must
specify the direction of application. In this case, it is 'down'; that
is, the US representation is on the "upper" side of the transducer:
xfst[1]: apply down 1,234.99
1.234,99
xfst[1]:
The 'apply' command may also be used to take the input strings from a file
instead of typing them directly. The file is processed line by line:
fst[1]: apply down < US-num.txt
Opening file US-num.txt...
apply down> 1,234.99
1.234,99
apply down> 0.5
0,5
apply down> 100,000,000
100.000.000
Closing file US-num.txt...
In order to have the transducer available in the future, we can save it
to a file. The command 'save' writes all the networks currently on the
stack into a single file. In this case, the file will contain just one
network:
xfst[1]: save stack US-to-EU-num.fst
xfst[1]:
The command 'quit' exits from XFST and returns back to Unix:
xfst[1]: quit
>
However, instead of quitting, we move on to the next topic.
Loading and using a stored network
In this example, we load the network we just created from a file to the
stack, add another network on the top of the first one, and then perform
an operation to replace both of them with the result of that operation.
Let us first empty the stack with 'clear stack' and then load the network
back from the file.
xfst[1]: clear stack
xfst[0]: load US-to-EU-num.fst
xfst[1]:
We will now create another network by compiling a simple network from the
same little text file we already used above
xfst[1]: read text < US-num.txt
Opening file US-num.txt...
Closing file US-num.txt...
20 states, 21 arcs, 3 words.
xfst[2]:
The 'read text' command expects as its argument a name of a file containing
a list of words, one entry per line. It compiles the word list into a network.
The command 'print words' displays the content of the compiled word list.
xfst[1]: print words
1,234.99
100,000,000
0.5
xfst[2]:
As the prompt 'xfst[2]:' indicates, we now have two networks on the stack.
We can get more information about the stack with the 'print stack' command:
fst[2]: print stack
0: 20 states, 21 arcs, 3 words.
1: 4 states, 41 arcs, Circular.
fst[2]:
The positions in the stack are numbered from 0 upwards. The net in position
0, the three-word network, is on top of the stack. Unary commands such
as 'print net' and 'print words' apply to the top network on the stack,
if not followed by some defined name. The net below it, in position 1,
is the transducer converting US-style numbers to European-style numbers.
Instead of applying the transducer to strings one at the time, as we
did before with the 'apply down' command, we can now transduce the entire
list of words at once with the 'compose net' command:
fst[2]: compose net
20 states, 21 arcs, 3 words.
fst[1]:
The compose operation replaces the two networks on the stack by the result
of the operation. Thus we now have just one network left. It is instructive
to view its contents using the same 'print words' command as before.
fst[1]: print words
0<.:,>5
100<,:.>000<,:.>000
1<,:.>234<.:,>99
fst[1]:
The result of the composition is a transducer. It denotes a relation, a
mapping from one regular language into another one. On its "upper side",
the transducer has the three original US-style numbers, each mapped to
a corresponding European-style on the "lower side" of the transducer. For
the most part, the mapping is an identity relation because each digit is
mapped to itself. The only difference is that periods are mapped to commas,
and vice versa. The angle brackets indicate pairs of non-identical symbols:
<.:,> indicates the correspondence of an upper-side period
with a lower-side comma; similarly for <,:.>.
We can view the upper and lower languages of the relation independently.
print upper-words displays the three US numbers; print lower-words
shows what they have been transduced into:
fst[1]: print lower-words
0,5
100.000.000
1.234,99
fst[1]:
As before, can use the 'apply' commands to map strings on one side of the
transducer to the corresponding strings on the other side. For example,
'apply up' maps any one of the three lower-side European numbers to the
corresponding upper-side string.
fst[1]: apply up 0,5
0.5
We can also extract one of the other language from the relation. The command
'lower-side net' extracts from the transducer a simple automaton that contains
just the three European numbers.
fst[1]: lower-side net
20 states, 21 arcs, 3 words.
fst[1]: print words
1.234,99
100.000.000
0,5
fst[1]:quit
Running XFST with a script
The examples we have given so far are about using XFST in an interactive
mode, with the user typing commands on the prompt line. It is more convenient,
for many purposes, to write a list of commands to be run in batch mode
without any user interaction. To illustrate this possibility, we write
a script that compiles the US-to-European transducer. and uses it to produce
a file of European-style numbers from a file of US-style numbers.
A script is an ordinary text file that can be prepared with any text
editor, such as Emacs. Here we create the script with the help of the simple
Unix 'cat' command. The command 'cat > xfst.script' writes the rest of
the text up to the closing ^D (control-D) into the file xfst.script.
cat > xfst.script
# A script for producing a file of European numbers
# from a file of US numbers.
echo >> Defining Digit
define Digit [%0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9] ;
echo >> Defining number converter
read regex %. -> %, , %, -> %. || Digit _ Digit ;
echo >> reading US numbers
read text < US-num.txt
echo >> Composing and extracting EU numbers
compose net
lower-side net
echo >> Creating EU-num.txt
write text EU-num.txt
system ls -l EU-num.txt
echo >> Done.
quit
^D
The lines beginning with # are comments that are ignored by xfst.
The 'echo' command prints the rest of the line to the standard output when
the script is executed. In complex scripts it is often useful to print
progress reports during the execution.
Because the purpose of this script is to produce an output file, we
verify at the end that the file has indeed been created. The 'system' command
sends the rest of the line to the Unix operating system. If the file is
in the directory, the script was run successfully
To execute the script, we invoke xfst with the -f flag:
> xfst -f xfst.script
>> Defining Digit
2 states, 10 arcs, 10 words.
>> Defining number converter
4 states, 41 arcs, Circular.
>> reading US numbers
Opening file US-num.txt...
Closing file US-num.txt...
20 states, 21 arcs, 3 words.
>> Composing and extracting EU numbers
20 states, 21 arcs, 3 words.
20 states, 21 arcs, 3 words.
>> Creating EU-num.txt
-rw-r--r-- 1 lk xerox 25 May 30 13:09 EU-num.txt
>> Done.
bye.
The -f flag causes xfst to exit after the script has been run.
If the user wishes to take over at that point and continue running xfst
in an interactive mode, he can call the script with -l flag instead
of -f.
Another non-interactive way of running xfst is to include a sequence
of xfst commands directly on the Unix command line. Each xfst command must
be enclosed in double quotes and preceded by an -e flag. If the
final -stop flag is omitted, xfst executes the commands and continues
running in an interactive mode.
> xfst -e "define Digit [%0|1|2|3|4|5|6|7|8|9] ;" \
-e "read regex %. -> %, , %, -> %. || Digit _ Digit;" \
-e "read text < US-num.txt" \
-e "compose net" \
-e "lower-side net" \
-e "write text EU-num.txt" -stop
Note the backslash at the end of each line. It signals to to the Unix operating
system that the entire sequence of commands counts as a single command
line.
Defining aliases
XFST allows the user to create simple names for more complex commands.
For example,
alias dir system ls -l *.txt
creates a new XFST command 'dir' that has the same effect as 'system ls
-l *.txt', that is, it sends to the Unix operating system the command 'ls
-l *.txt'. The chosen alias must be a single word with no hyphens, underscores,
or other special characters. The command 'print alias' lists all the current
aliases and their definitions.
An alias can represent an arbitrary sequence of commands. To create
such an alias, the user first types only the name to be defined.
xfst[0]: alias ConvertAndShow
XFST responds by prompting the user for commands. The list can be terminated
by ^D (control-D) or by a special symbol, END;, with no extra whitespace
around it:
alias> define Digit [%0|1|2|3|4|5|6|7|8|9] ;
alias> define Conv %. -> %, , %, -> %. || Digit _ Digit ;
alias> read regex [@txt "US-num.txt" .o. Conv].l ;
alias> echo
alias> echo >> Here is the result:
alias> print words
alias> END;
With slightly different means, the above definition of the alias 'ConvertAndShow'
produces the same end result as our previous examples: a network of European-style
numbers obtained by transduction from a US-style source file. We define
the conversion relation, Conv, by means of the earlier definition of Digit.
Instead of the command 'read text' to compile the word list in US-num.txt,
we use the regular expression operator @txt to bring about the
same effect. Similarly, in place of the XFST commands 'compose net' and
'lower-side net', we employ the corresponding operators in the regular
expression calculus: [@txt "US-num.txt" .o. Conv].l denotes the
lower side language (.l) of the composition (.o.) of the US numbers
with the conversion relation.
Now XFST interprets ConvertAndShow as a command to execute the defined
set of commands:
xfst[0]: ConvertAndShow
2 states, 10 arcs, 10 words.
4 states, 41 arcs, Circular.
Opening file US-num.txt...
Closing file US-num.txt...
20 states, 21 arcs, 3 words.
20 states, 21 arcs, 3 words.
>> Here is the result:
1,234.99
100,000,000
0.5
xfst[1]:
A convenient way to extend the xfst command language is to place such alias
definitions in a script file that is executed with with the -l
flag as the application is launched.
Overview of Commands
This section contains a general overview of xfst commands. A complete alphabetical
list, with explanations, is found in the next section.
Command Syntax
XFST commands are in general of the form '<command> <type or
object>', where <command> specifies the operation to be performed
and the second term, if any, gives some additional specification about
the type of the operation or the object it applies to. For example, there
are several variants of the 'print' command: 'print net', 'print sigma',
'print words', etc.
All display commands and all unary operations, such as 'lower-side net',
apply to the network on the top of the stack. For example, 'print words'
enumerates the paths of the topmost network only, although there may be
others on the stack. Some commands, such as 'print net' and 'print words',
can be followed by a name of network which has been given a name with the
'define' command, as shown below:
xfst[0]: read text
text> two
text> words
text> ^D
8 states, 8 arcs, 2 words.
fst[1]: define W2
fst[0]: print words W2
two
words
xfst[0]:
Virtually all XFST commands can be abbreviated to a single word command.
For example, the 'print' part of all print commands can be dropped. Thus
'sigma' as a command has the same effect as 'print sigma'. Similarly, 'regex'
and 'read regex' are equivalent.
Short command names are convenient when one is working in an interactive
mode. On the other hand, for scripts we recommend using the long commands
because it tends to make the scripts more perspicuous.
Command Classes
The FST commands can be grouped into five classes. We describe first in
general terms the five classes and list the commands in each class. A more
detailed description of each command is given in an alphabetical listing
in the next section.
-
Input/Output and Stack Commands
-
move networks onto and off the stack, save the stack or the top network
into a file in various formats. We use the commands 'load' and 'save' for
binary files, 'read' and 'write' for text files. 'read regex' and 'define'
compile a network from a regular expression; 'read regex' pushes the result
on the stack, 'define' assigns it to the chosen name.
The 'read' commands take the input from the console (stdin)
unless the user specifies another source by means of < symbol and a
file name. For example, 'read text < /usr/dict/words' compiles the list
in /usr/dict/words.
Similarly, the 'write' commands send the output to the console
(stdout) unless the command is followed by > and a name of a
target file. It is also possible to send the output directly to a
printer. For example, 'write text | lp' uses the | convention
in Unix to pass the output to the 'lp' print command.
|
Binary
input/output
|
load stack, save stack, load defined, save defined |
|
Text
input/output
|
read prolog, read spaced-text, read text, write prolog, write spaced-text,
write text. |
|
Regular
expressions
|
define, read regex, undefine |
|
Stack
commands
|
clear stack, pop stack, push defined, turn stack, rotate stack |
-
Display commands
-
give information about the content and properties of the network on the
top of the stack, the language or relation it represents, the state of
the system, and the available commands.
Like the 'write' commands in the previous section, the 'print' commands
display the output on the console, unless the user directs it to a file
or a printer. The 'print' commands do not have a corresponding input command.
|
Stack and network
information
|
print eqv-labels, print flags, print label-tally, print net, print
name, print size, print sigma, print labels, print stack, inspect net |
|
System information
|
print aliases, print file-info, print defined, print directory, print
storage |
|
Network
content
|
print random-lower, print random-upper, print words, print lower-words,
print upper-words, print nth-upper, print nth-lower, print num-upper, print
num-lower |
|
Apply network
to a string
|
apply up, apply down |
|
Help
|
apropos, help |
|
Other
|
echo |
-
Tests of network properties
-
Unary tests check whether the network on the stack has certain properties.
Binary tests compare the languages of the two top network on the stack.
|
Unary
|
test lower-bounded, test upper-bounded, test lower-universal, test
upper-universal |
|
Binary
|
test equivalent, test overlap, test sublanguage |
-
Operations on networks
-
produce a network that represents a new language or relation by modifying
the top network, two top networks, or all the networks on the stack. The
affected networks are removed from the stack and replaced by the result
of the operation.
Property list commands modify or display the attribute-value pairs
on the top network's property list. Maintenance commands perform
various housekeeping operations of the top level network structure
without changing the language or the relation it represents.
|
Unary
operations
|
invert net, label net, lower-side net,
negate net, one-plus net, reverse net, sigma net,
substitute defined, substitute label,
substitute symbol, substring net, upper-side net,
zero-plus net |
|
Binary
operations
|
crossproduct net, minus net |
|
N-ary
operations
|
compose net, concatenate net, intersect net, shuffle net, union net |
|
Property
list
|
add properties, edit properties, name net, read properties, write
properties |
|
Maintenance
|
complete net, determinize net, eliminate flag, epsilon-remove net,
minimize net, prune net, sort net, compact sigma, optimize net, unoptimize
net |
-
System commands
-
change the internal configuration of XFST or interact with the Unix operating
system.
|
Variables
|
show, set |
|
Aliases
|
alias |
|
System call
|
system |
|
Execute script
|
source |
|
Other
|
quit |
Alphabetical Index
This index consists of three parts. The first index lists all the currently
implemented XFST commands. The second list describes the variables that
the user can manipulate with the XFST 'set' command. The final part explains
the correspondence between certain XFST commands and certain operators
in the regular expression calculus. The XFST commands involved are the
'load' and 'read' commands and all commands that construct a new network
by means of an operation on one or more networks on the stack.
List of Commands
This list gives a brief description of all the currently implemented XFST
commands. This information is also available on-line in XFST with the 'help'
command. For example, 'help alias' prints out the same text as shown below.
-
add properties
add properties < source-file
-
reads a set of attribute-value pairs and adds them to the property list
of the top network on the stack. Each line in the source file must contain
one attribute-value pair. For example,
COPYRIGHT: "Xerox Corporation"
DATE: "June 10, 1997"
VERSION: 9
The attributes, each terminated by a colon, must be strings (upper case
recommended). The values may be strings or integers. String values must
be enclosed in double quotes: "9" and 9 are considered as different values.
If an attribute to be added already exists on the property list of the
network, 'add properties' will reset it to a new value.
See also: 'read properties', 'write-properties'.
-
alias
alias name
alias name command-sequence
-
'alias' defines as a shorthand for the given sequence of XFST commands.
For instance, the new command 'ls' can be created in the following way:
alias ls print directory
Alias may contain several commands. A new command 'lexr', for reading a
lexicon, can be created in the following way:
fst[0]: alias lexr
alias> load lexicon.fst
alias> print properties
alias> END;
A multi-command alias may also be defined on one line as follows:
alias lexr load lexicon.fst; print properties
See also: 'print aliases'.
-
apply up
apply up line
apply up < source-file
-
Simulates the composition of the input string with the lower side of the
top network on the stack and extracts the result from the upper side. If
the command 'apply up' is followed by a carriage return,the program prompts
the user with apply>, prints the result and gives a prompt for
a new input. Control-D exits from this mode.
-
apply down
apply down line
apply down < source-file
-
Simulates the composition of the input string with the upper side of the
top network on the stack and extracts the result from the lower side. If
the command 'apply down' is followed by a carriage return, the program
prompts the user with apply>, prints the result and gives a prompt for
a new input. Control-D exits from this mode.
-
apropos word
-
prints brief information about the commands relating to the given word.
If you need further information about a command, use the command 'help
command-name'.
-
clear stack
-
Removes all networks on the stack.
-
compact sigma
-
all symbols that have the same distribution as the unknown symbol are
eliminated from the alphabet and the corresponding redundant arcs are
removed from the network.
-
complete net
-
extends the top network until it has a transition for every symbol in sigma
in every state. All new transitions lead to a failure state, thus the language
of the network is preserved.Complete is defined only for simple networks
(arity = 1), not for transducers.
-
compose net
-
replaces the stack with the composition of all networks currently on the
stack. If the stack consists of two networks, defined as A and B, with
A on top of the stack, the result is equivalent to compiling the regular
expression [A .o. B].
-
concatenate net
-
replaces the stack with the concatenation of all networks currently on
the stack. If the consists of two networks, defined as A and B, with A
on top of the stack, the result is equivalent to compiling the regular
expression [A B]. In regular expressions, concatenation is indicated simply
by whitespace.
-
crossproduct net
-
returns the crossproduct of the first two networks on the stack. They must
not be transducers. If the stack consists of two networks, defined as A
and B, with A on top of the stack, the result is equivalent to compiling
the regular expression [A .x. B].
-
define symbol
define symbol regular-expression
-
binds the given symbol to a network compiled from the regular
expression. Every subsequent occurrence of the symbol in a regular
expression is interpreted as the name of the corresponding language or
relation. To unbind the symbol use 'undefine'. If the regular
expression is omitted, i.e. if the command is 'define', the symbol is
bound to the top net on the stack. The net is removed from the
stack. See also: 'load defined', 'save defined', 'print defined',
'read regex', 'undefine'.
-
determinize net
-
replaces the top network with an equivalent deterministic network. A
deterministic net has at most one transition per label from each
state.
-
echo text
-
Prints the text, followed by a newline.
-
edit properties
-
'edit properties' allows the user to edit the property list of the top
network on the stack. Properties may be added, deleted, and modified. Values
must be integers or strings.
-
eliminate flag
-
replaces the top network on the stack with an equivalent network that
encodes directly the constraints concerning the given flag
attribute. A flag symbol contains two or three fields separated by a
comma. Flag symbols with three fields are of the form
@Action.Attribute.Value@ where
Action is U (unify), R (require),
P (set to positive value), N (set to the complement
value). The other fields, Attribute and
Value, can be chosen freely. Flag symbols with two
fields are of the form @Action.Attribute@,
where Action is either C (clear), or
R (require), or D (disallow). For example, the
command 'eliminate Harmony' removes all flags symbols with Harmony in
the the attribute field and all paths that fail to meet the Harmony
constraints.
-
epsilon-remove net
-
replaces the top network with an equivalent one that has no epsilon
transitions.
-
help
help command-name
-
'help' lists the commands and the command classes. To get more
information about a command, use the command 'help command'. It gives
a longer explanations about the command and its usage. All possible
arguments are listed and explained. For instance, the command 'help'
(see above) may be used without arguments or it may have one argument
that is a name of a command.
The command name may designate a class of commands, for instance, 'help
stack' prints all stack commands. If the given command name is not recognised,
command help works like command 'apropos'.
-
inspect net
-
invokes a mode for interactively inspecting the top network on the stack.
-
intersect net
-
replaces the stack with the intersection of all networks currently on the
stack. If the stack consists of two networks, defined as A and B, regardless
of the order, the result is equivalent to compiling the regular expression
[A & B].
-
invert net
-
exchanges the two sides of the top network on the stack. If the network
is defined as A, the result is equivalent to compiling the regular expression
[A.i]. 'invert net' has no effect on simple networks.
-
label net
- replaces the top network on the stack with a network that
represents the language or relation that includes all the strings
and string pairs in the label alphabet of the original network.
See also 'print labels', 'sigma net'.
-
load defined source-file
-
restores the definitions from a binary source file. The networks are
assigned to the chosen names directly, without pushing them onto the
stack. The file must be in the binary format, as produced by the 'save
defined' command.
-
load stack source-file
-
reads the networks in the given file, and pushes them onto the the stack.
The file must be in the binary format, as produced by the 'save stack'
command. If the file only contains a single network, it can also be loaded
with the command 'read regex [@bin source-file];'.
-
lower-side net
-
extracts the lower language of the top network on the stack. If the network
is defined as A, the result is equivalent to compiling the regular expression
[A.l]. 'lower-side net' has no effect on simple networks.
-
minimize net
-
replaces the top network on the stack with an equivalent network that has
a minimal number of states.
-
minus net
-
replaces the top two networks on the stack with a network containing all
strings in the first network that are not in the second. If the two networks
are defined as A and B, with A on top of the stack, the result is equivalent
to compiling the regular expression [A - B].
-
name net symbol
-
assigns the given name to the top network on the stack.
-
negate net
-
replaces the top network on the stack with a network that accepts all and
only those strings rejected by the original. If the network is defined
as A, the result is equivalent to compiling the regular expression [~A].
-
one-plus net
-
concatenates the top network any number of times with itself. If the network
is defined as A, the result is equivalent to compiling the regular expression
[A+].
-
optimize net
-
runs a heuristic algorithm that tries to reduce the number of arcs, possibly
at the cost of introducing more states. 'unoptimize net' restores
the original network.
-
pop stack
-
Removes the top network on the stack.
-
print aliases
print aliases > target-file
-
prints each alias and the command sequence it encodes.
-
print defined
print defined > target-file
-
prints each defined symbol and the size of the network it stands for.
-
print directory
print directory pattern
print directory pattern > target-file
-
prints the content of the current directory in its entirety or the files
that match the given pattern, such as *.txt.
-
print eqv-labels
print eqv-labels > target-file
-
prints the label equivalence classes of the networks on the stack. Labels
that belong to the same equivalence class always appear together on arcs
that lead to the same destination state.
-
print file-info
print file-info > target-file
-
displays info about the last opened binary file.
-
print flags
print flags > target-file
-
'print flags' prints information about the top network on the stack.
-
print labels
print labels symbol
print labels > target-file
- prints the label alphabet of the top network on the stack (by
default) or of the given defined network. The label alphabet includes all the
symbols and symbol pairs that occur as arc labels in the network. See also
'print sigma', 'label net'.
-
print label-tally
print label-tally > target-file
-
tallies label frequencies in the top network on the stack.
-
print lower-words
print lower-words name
print lower-words number
print lower-words name number
print lower-words > target-file
-
displays the words in the lower side language of the top network or in
the network assigned to the defined symbol. If a number is given, only
that many words are printed, otherwise every word is printed but not looping
on any path more than once. The output can be directed to a file by adding
a greater-than symbol, >, and a name of a target file to the end of the
command. For example, 'print lower-words > lower-words.txt'. If the network
is a simple network rather than a transducer, 'print lower-words', 'print
words', and 'print upper-words' give the same output. See also the variable
'print-space'.
-
print name
-
prints the name of the top network on the stack. If the net is unnamed,
the hex address of the net is printed instead. Use the command 'name net'
to give a name to the top network.
-
print net
print net symbol
print net > target-file
-
prints a text representation of the top network on the stack (by
default) or the given defined network. The output is directed to a file by
adding a greater-than symbol, >, and a name of a target file to the
end of the command.
-
print nth-lower
print nth-lower name
print nth-lower number
print nth-lower name number
print nth-lower > target-file
-
displays the word on the lower side of the nth path in the top level network
or in the network assigned to the defined symbol. This command only works
on acyclic networks. The inverse command is 'print num-lower'.
-
print nth-upper
print nth-upper name
print nth-upper number
print nth-upper name number
print nth-upper > target-file
-
displays the word on the upper side of the nth path in the top level network
or in the network assigned to the defined symbol. This command only works
on acyclic networks. The inverse command is 'print num-upper'.
-
print num-lower word
print num-lower name word
print num-lower > target-file
-
displays the numbers of all paths in the top level network or int the named
network that have word on the lower side. This command only works on acyclic
networks. The inverse command is 'print nth-lower'.
-
print num-upper word
print num-upper name word
print num-upper > target-file
-
displays the numbers of all paths in the top level network that have word
on the upper side. This command only works on acyclic networks. The inverse
command is 'print nth-upper'.
-
print random-lower
print random-lower number
print random-lower > target-file
-
generates 15 random words (by default) from the top network on the stack
or any number specified in the command. The words are from the lower side
language of the net. If the network is a simple automaton, rather than
a transducer, 'print random-lower' gives the same output as 'print random'
and 'print random-upper'.
-
print random-upper
print random-upper number
print random-upper > target-file
-
generates 15 random words (by default) from the top network on the stack,
or any number specified in the command. The words are from the upper side
language of the net. If the network is a simple automaton, rather than
a transducer, 'print random-upper' gives the same output as 'print random'
and 'print random-lower'.
-
print random-words
print random-words number
print random-words > target-file
-
generates 15 random words (by default) from the top network on the stack
or any number specified in the command. If the network is a transducer,
the output may contain symbol pairs. If the network is a simple automaton,
rather than a transducer, 'print random' gives the same output as 'print
random-lower' and 'print random-upper'.
-
print sigma
print sigma name
print sigma > target-file
-
prints the sigmaalphabet of the top network on the stack (by
default) or the given defined network. The sigma alphabet includes all the
symbols that appear in the network, either as such or as a constituent
of a symbol pair, and any symbols that have been excluded from
the network by subtraction or complementation. See also 'print labels',
'sigma net'.
-
print size
print size name
print size > target-file
-
prints the size of the top network on the stack (by default) or the given
defined network.
-
print stack
print stack > target-file
-
displays the content of the stack.
-
print storage
print storage > target-file
-
displays memory usage
-
print upper-words
print upper-words name
print upper-words number
print upper-words name number
print upper-words > target-file
-
displays the words in the upper side language of the top network or in
the network assigned to the defined symbol. If a number is given, only
that many words are printed, otherwise every word is printed but not looping
on any path more than once. The output can be directed to a file by adding
a greater-than symbol, >, and a name of a target file to the end of the
command. For example, 'print upper-words > upper-words.txt'. If the network
is a simple network rather than a transducer, 'print lower-words', 'print
words', and 'print upper-words' give the same output. See also the variable
'print-space'.
-
print words
print words name
print words number
print words name number
print words > target-file
-
displays the paths in the top network or in the network assigned to the
defined symbol. If a number is given, only that many paths are printed,
otherwise every word is printed but not looping on any path more than once.
The output can be directed to a file by adding a greater-than symbol, >,
and a name of a target file to the end of the command. For example, 'print
words > words.txt' If the network is a transducer, the output may contain
symbol pairs. Otherwise, 'print lower-words', 'print words', and 'print
upper-words' give the same output. See also the variable 'print-space'.
-
prune net
-
removes all paths leading to non-final states in the top network on the
stack.
-
push defined symbol
-
makes a copy of the network named by a previous 'define symbol'
command and pushes it onto the stack. 'push defined symbol' has
the same effect as 'read regex symbol ;'.
-
quit
-
Exits the program.
-
read prolog
read prolog < source-file
-
reads the network(s) contained in the file and pushes them onto the
stack. The file must be in the format produced by the 'write prolog'
command. The the same effect can be produced with the command 'read
regex [@pl source-file];'
-
read properties
read properties < source-file
-
reads a set of attribute-value pairs resets the property list of the top
network on the stack. See also: 'add properties', 'write properties'.
-
read regex
read regex < source-file
-
creates a network from a regular expression and and pushes it onto the
stack. The file should contain one regular expression. If no filename is
given the input is read from the console. The input expression must be
terminated by a semicolon.
-
read spaced-text
read spaced-text < source-file
-
reads a transducer or a net with multicharacter symbols from a file. See
also function 'print words' and variable 'print-space'.
-
read text
read text < source-file
-
reads the net as text from a file. See also function 'print words' and
variable 'print-space'.
-
reverse net
-
replaces the top net on the stack with one that accepts the reverse
language.
-
rotate stack
-
Pushes the top element of the stack to the bottom.
-
save defined target-file
-
stores the networks for all defined symbols into a binary file. Use 'load
defined' to restore the definitions.
-
save stack target-file
-
stores all the networks on the stack into the given file in a binary format.
Use 'load stack' to load the file.
-
set variable value
-
sets a new value to the variable. The value may be an integer, ON or OFF,
or a string. See variable for details. See also: 'show'.
-
show variable
-
shows the value of the variable. See also 'set'.
-
shuffle net
-
replaces the stack with a single network that accepts every string formed
by shuffling together (interdigitanting) one string from each of the input
languages. For instance, if Net1 accepts ab and Net2 accepts xy, then the
shuffle of these nets will accept the following six strings: abxy axby
axyb xaby xayb xyab.
-
sigma net
-
replaces the top network on the stack with a network that
encodes the "sigma language" of the original network, that is,
the union of all symbols in the sigma alphabet. See also
'print sigma', 'label net'.
-
sort net
-
Reorders the arcs of the top network in a canonical order.
-
source command-file
-
executes a file of XFST commands. For example, 'source xfst.script' executes
the commands in xfst.script.
-
substitute defined
-
'substitute defined NAME for SYMBOL' replaces the top network on the
stack with a network derived by splicing in the network associated
with the given name for each arc that contains the symbol. If the
network is a transducer, SYMBOL must not occur in a symbol pair.
-
substitute label
-
'substitute label LIST_OF_LABELS for LABEL' replaces the top network
on the stack with a network derived by replacing every arc with the
given label by an arc or arcs, each labeled with one of the substitute
labels. If the list consists of the keyword NOTHING, all paths
containing the given label are removed. The label and the substitutes
may be single symbols or a symbol pairs. See also
'substitute symbol'.
-
substitute symbol
-
'substitute symbol LIST-OF-SYMBOLS for SYMBOL' replaces the top
network on the stack with a network derived by replacing every arc
whose label contains the given symbol by an arc or arcs whose label
instead contains one of the substitute symbols. If the list consists
of the keyword NOTHING, all paths containing the given symbol are
removed. The symbol and the substitutes must be single symbols, not
symbol pairs. See also 'substitute label'.
-
substring net
-
replaces the top network on the stack with a network that accepts every
string that is a substring of some string in the language of the original
network.
-
system system-command
-
'system command' executes the given operating system command(s), for instance,
'system ls'.
-
test equivalent
-
returns 1 (TRUE) if the two topmost networks contain the same language
-
test lower-bounded
-
returns 1 (TRUE) if the lower side of the top level network has no epsilon
cycles.
-
test lower-universal
-
returns 1 (TRUE) if the lower side of the top level network represents
the universal language.
-
test overlap
-
returns 1 (TRUE) if the languages of the two topmost networks have a non-empty
intersection.
-
test sublanguage
-
returns 1 (TRUE) if the language of the topmost network is a sublanguage
of the second network on the stack.
-
test upper-bounded
-
returns 1 (TRUE) if the upper side of the top level network has no epsilon
cycles.
-
test upper-universal
-
returns 1 (TRUE) if the upper side of the top level network contains the
universal language.
-
turn stack
-
reverses the order of the networks on the the stack.
NOTE: Use the command 'reverse net' to create the reverse language
of the top net.
-
undefine list-of-symbols
-
unbinds the symbol that was bound by the 'define' command. 'undefine symbol1
symbol2 symbol3 ...' etc. undefines all the listed symbols. 'undefine ALL'
removes all the definitions.
-
union net
-
replaces the stack with the union of all networks currently on the stack.
-
unoptimize net
-
reverses the effect of 'optimize net' restoring the original
network.
-
upper-side net
-
extracts the upper language from the network on the top of the stack. If
the network is defined as A, the result is equivalent to compiling the
regular expression [A.u]. 'upper-side net' has no effect on simple
networks.
-
write prolog
write prolog > target-file
-
writes the network as Prolog-clauses. See also: 'read prolog'.
-
write properties
write properties name
write properties > target-file
-
prints the attribute-value pairs on the property list of the network. See
also 'add properties', 'read properties'.
-
write spaced-text
write spaced-text > target-filetarget-file
-
Stores the top network in a text format. This format is intended for
transducers and networks with multi-character labels. Use 'read
spaced-text' to read files in this format. Use 'write text' for simple
networks with single-character labels. Use 'save stack' to store in a
compact binary format. If no target file is specified, output will be
sent to standard output (usually the terminal).
-
write text
write text > target-file
-
stores the words in the top network, one per line. This format cannot be
used for transducers or networks with multi-character labels. For such
networks, use 'save-as-spaced-text' instead. Use 'save stack' to store
in a compact binary format. If no target file is specified, output will
be sent to standard output (usually the terminal).
-
zero-plus net
Variables
This is the list of variables that the user can set with the command 'set
variable value' with the list of available values. The command
'show variable' displays the current value of the variable.
-
directory filename
-
the working directory.
-
mark-version (ON/OFF)
-
when ON, the name and version number of the application is
recorded on the property list of each saved network under the property
TOOLVERSIONS. For example, TOOLVERSIONS: (("XFST"
"5.6.7")).
-
max-regex-errors (integer)
-
the maximum number of errors for regular expressions. When it is exceeded
the rest of the input file is discarded. Default is 10.
-
minimal (ON|OFF)
-
when ON, network operations minimize the result. Minimization includes
epsilon-removal and determinization. Default is ON.
-
name-nets (ON|OFF)
-
when ON, regex names networks. Default is OFF.
-
print-pairs (ON|OFF)
-
when ON,'apply up' and 'apply down' show both (UPPER and LOWER) sides
of labels. Default is OFF.
-
print-sigma (ON|OFF)
-
when ON, the sigma is shown when a network in printed. Default is ON.
-
print-space (ON|OFF)
-
when ON, 'print word' inserts a space between symbols. Default is OFF.
-
process-in-order (ON|OFF)
-
when ON, intersection and composition operations process networks in the
order they appear in the stack. Default is OFF.
-
quit-on-fail (ON|OFF)
-
when ON, the application exits with an error if a command is incorrectly
spelled or cannot be executed because of a missing file or some other error.
This is useful when the application is run by a shell script. Default is
OFF.
-
quote-special (ON|OFF)
-
When ON literal instances of special characters are enclosed in double
quotes: zero and question mark are printed as "0" and "?" to distinguish
them from epsilon and the unknown symbol. Space, tab, and newline are printed
as " ", "\t", and "\n", respectively, to distinguish them from each other.
The affected commands are 'print words', 'print lower-words', and 'print
upper-words'. Default is OFF.
-
random-seed (integer)
-
seed of the random number generator. It cannot be inspected.
-
sort-arcs (ON|OFF)
-
when ON, the arcs of a network are sorted before the network is pushed
onto the stack. Default is ON.
-
unicode (ON|OFF)
-
if the value is ON, two-byte characters are printed as four hex integers
preceded.by \u. If the value is OFF, we try to display 16-bit characters
directly. Default is OFF.
-
verbose (ON|OFF)
-
when ON, more information is printed. Default is ON.
Commands <-> RE operators
All the XFST commands that produce a new network from one or more
expressions on the stack and replace them with the result of the
operation correspond to an operation in the regular expression
calculus (see Syntax
and Semantics of Regular Expressions). Some of the load and read
commands also have a counterpart in the regular expression language.
The correspondence is summarized below along with a characterization
of the language or relation that the result implements, described in terms
of the corresponding regular expression. We also indicate for each command
whether it applies to the top network (unary), the two top networks (binary),
or to all networks on the stack (n-ary).
-
concatenate net (n-ary)
-
[A B] denotes the concatenation of all strings or string pairs
in A with all strings or string pairs in B.
-
compose net (n-ary)
-
[A .o. B] denotes the composition of the relations A and B.
-
crossproduct net (binary)
-
[A .x. B] denotes the relation that maps every string of A to
every string of B.
-
intersect net (n-ary)
-
[A & B] denotes the set of strings that belong to both A and
B.
-
invert net (unary)
-
[A.i] denotes the relation that is the inverse of A; that is the
upper and lower members of each string pair in the relation are switched
around but the relation remains otherwise the same.
-
load stack
-
[@bin source-file] denotes the topmost network in the binary
source-file. Note that @bin only retrieves one network,
whereas 'load stack' pushes onto the stack all the networks in the file.
If the name of the source file contains a period, a slash or some other
special character, the name should be enclosed in double quotes.
-
lower-side net (unary)
-
[A.l] denotes the lower language of the relation A.
-
minus net (binary)
-
[A - B] denotes the set of strings that are in A but not in B.
-
negate net (unary)
-
[~A] denotes the set of all strings that are not in A.
-
one-plus net (unary)
-
[A+] denotes all the strings that are composed by concatenating
strings of A with each other any number of times.
-
read prolog
read regex
read spaced-text
read text
-
have corresponding regular expression operators: @pl, @re,
@stxt, and @txt, respectively, These unary operators
are prefixed to the name of the source file. If the name contains a period,
a slash or some other special character, the name must be enclosed in double
quotes. For example,
read regex [@re regexfile] ;
compiles the regular expression in regexfile;
read regex [@txt "words.txt"] ;
has the same effect as
read text < words.txt
-
reverse net (unary)
-
[A.r] denotes the language or relation that is the reverse of
A.
-
substitute symbol SYMBOL_LIST for SYMBOL (unary)
-
`[A, SYMBOL, SYMBOL_LIST] denotes the language or relation
derived from A by replacing every instance of the given symbol by the
union of the symbols on the list.
-
union net (n-ary)
-
[A | B] denotes the union of expressions A and B. All the strings
and string pairs that belong to either A or B (or both) also belong to
the union.
-
upper-side net (unary)
-
[A.u] denotes the upper language of the relation A.
-
zero-plus net (unary)
-
[A*] denotes the set of strings that consists the empty string
and any number of concatenations of A with itself. [A*] is like
[A+] except that it always includes the empty string.
N.B. There are many regular expression operators without an equivalent
command in XFST. For example, the replace (->) and restriction (=>) operators
do not correspond to any XFST command.
We welcome your
comments and suggestions.
Copyright © The Document Company - Xerox 1997. All rights reserved.
Written by Lauri Karttunen, Tamás Gaál,
and André Kempe.
Last modified on Jan. 12, 1998
.