ALGORITHMS AND IMPLEMENTATION

Decoding the transfer data files (file: transfer-lexicon.lisp)

Reading input

The transfer data is described in files. The files are read and decoded by different finite state automatons.

While reading a file, the input is composed of two parts: a string and a delimiter. The string is any sequence of characters which are not delimiters, the delimiter is the first special delimiter that follows the string, ignoring some meaningless delimiters (space, tab, newline and ",").

The transitions in the fsa are defined in terms of these two imputs, string and delimiter, which can be null (not both in the same input): a null string occurs when there are two consecutive special delimiters, a null delimiter when two strings are separated by meaningless delimiters.

A possibility to differ input has been added to the basic input functions, to allow transitions that don't use their input (or part of). The input functions first look for differed input before reading the files.

The main functions for input are:

Errors

When reading the files, errors can be reported (syntax errors especially). To have a better indication of the place where an error occurs in a definition, the definition while being read is echoed on an other stream, which can be read when an error occurs, to show the characters that have been read so far.

FSA

In the descriptions below, the transitions from one state to another are recorded with the possible inputs:

In the right side of the description, after ";" characters, are the interpretations of the strings or the name of another fsa which is called by the transition, in capital letters.

FAMILY DEFINITION FSA:

The family transfer definitions:

       (0) --- string/ ---> (1)         ; family name 1
       (1) --- string: ---> (2)         ; family name 2

       (2) --- string: ---> (3)         ; keyword
       (2) ------ { ------> (4)

       (3) ------ { ------> (4)

       (4) ---------------> (5)         ; NODE PAIRS FSA

       (5) ------ . ------> end
       (5) ---------------> (5)         ; TREE PAIRS FSA

TREE PAIRS FSA and TREE DEFINITION FSA:

The tree pairs and tree definition fsa's:

      (0) ---------------> (1)          ; TREE FSA
      (1) ------ / ------> (2)

      (2) ---------------> (3)          ; TREE FSA

      (3) ------ : ------> (4)
      (3) --- string ----> (0.1)

      (4) --- string: ---> (5)          ; keyword (tree def)
      (4) ------ { ------> end          ; NODE PAIR FSA (family def)
      (4) ------ { ------> (6)          ; NODE PAIR FSA (tree def)

      (5) ------ { ------> (6)          ; NODE PAIR FSA

      (6) ------ . ------> end          ; (in tree def)
TREE FSA:

The fsa for the tree definition file:

      (0) --- string/ ---> end          ; tree name
      (0) --- string: ---> end          ; tree name
      (0) --- string{ ---> (1)          ; tree name + FEATURE FSA
      (0) --- string[ ---> end          ; tree name + DERIVATION FSA
      (0) --- string ----> end          ; tree name

      (1) ------ / ------> end
      (1) ------ : ------> end
      (1) ------ [ ------> end          ; DERIVATION FSA
      (1) --- string ----> end          ; tree name for next tree pair
NODE PAIRS FSA:

The fsa for the node pair mapping file:

      (0) --- string: ---> (0.1)        ; type specification
      (0) --- string/ ---> (2)          ; node 1
      (0) --- string# ---> (1)          ; node 1
      (0) ------ } ------> end

      (0.1) idem (0) except -> (0.1)

      (1) --- string/ ---> (2)          ; tree id for node 1

      (2) --- string ----> (0)          ; node 2, other node pairs
      (2) --- string# ---> (3)          ; node 2
      (2) --- string} ---> end          ; node 2

      (3) --- string ----> (0)          ; tree id for node 2, other node pairs
      (3) --- string} ---> end          ; tree id for node 2
LEXICAL DEFINITION FSA:

The fsa for the lexical transfer file:

      (0) ---------------> (1)          ; LEX ENTRY FSA
      (1) ------ / ------> (2)

      (2) ---------------> (3)          ; LEX ENTRY FSA

      (3) ------ : ------> (4)
      (3) ------ . ------> end

      (4) --- string: ---> (4)          ; keyword
      (4) --- string. ---> end          ; keyword
      (4) ------ { ------> (5)          ; NODE PAIRS FSA

      (5) ------ . ------> end
GLOBAL DEFINITION FSA:

The fsa for the global lexical transfer file:

      (0) ---------------> (1)          ; LEX ENTRY FSA
      (1) ------ / ------> (2)

      (2) ---------------> (3)          ; LEX ENTRY FSA

      (3) ------ : ------> (4)

      (4) ------ { ------> (5)          ; NODE PAIRS FSA

      (5) ------ . ------> end
LEX ENTRY FSA:

The fsa for the transfer of lexical entries:

      (0) --- string ----> (1)          ; lexical entry
      (0) --- string{ ---> end          ; lexical entry + FEATURE FSA (lex def)
      (0) --- string/ ---> end          ; lexical entry
      (0) --- string: ---> end          ; lexical entry
      (0) --- string. ---> end          ; lexical entry

      (1) --- string{ ---> (2)          ; tree/family name + FEATURE FSA
      (1) --- string[ ---> end          ; tree name + DERIVATION FSA (global def)
      (1) --- string/ ---> end          ; tree/family name
      (1) --- string: ---> end          ; tree/family name
      (1) --- string. ---> end          ; tree/family name

      (2) ------ / ------> end
      (2) ------ : ------> end
      (2) ------ . ------> end
      (2) ------ [ ------> end          ; DERIVATION FSA (global def)
DERIVATION FSA:

The fsa to recognize the mapping between derivation trees:

      (0) --- string ----> (1)          ; lexical entry
      (1) --- stringX ---> (2)          ; tree name

      (2) --- string ----> (0)          ; new son
      (2) ------ { ------> (2)          ; FEATURE FSA
      (2) ------ ( ------> (2)          ; ADDRESS FSA
      (2) ------ # ------> (3)
      (2) ------ [ ------> (4)
      (2) ------ ] ------> end

      (3) --- stringX ---> (2)          ; tree id

      (4) --- stringX ---> (0)          ; new son
      (4) ------ ] ------> end
ADDRESS FSA:

The fsa for tree address recognition:

      (0) --- string ----> (0)          ; number in address
      (0) --- string) ---> end          ; last number in address
      (0) ------ ) ------> end          ; null address (root)
FEATURE FSA:

as in equations or lexicon for grammars

SIMPLE FEATURE FSA:

A fsa for simple features:

      (0) --- string/ ---> (1)          ; head category 1
      (1) --- string: ---> (2)          ; head category 2

      (2) ------ . ------> end
      (2) ---------------> (2)          ; FEATURE PAIR FSA
FEATURE PAIR FSA:

A fsa to recognize feature pairs:

      (0) ------ < ------> (1)
      (1) --- string ----> (1)          ; element of path 1
      (1) --- string> ---> (2)          ; element of path 1

      (2) ------ = ------> (3)

      (3) --- string/ ---> (4)          ; value 1

      (4) ------ < ------> (5)

      (5) --- string ----> (5)          ; element of path 2
      (5) --- string> ---> (6)          ; element of path 2

      (6) ------ = ------> (7)

      (7) --- string< ---> end          ; value 2, feature pair next
      (7) --- string. ---> end          ; value 2, end feature pairs

Organization of transfer data and access

(file transfer-manager.lisp)

Given a source derivation node, the transfer data base is searched for all the transfer structures that are compatible with the elementary tree of the derivation node: combination of tree and lexical transfers and global transfers.

The transfer manager stores the transfer data according to their type, and provides the relevant access index:

Use of keywords:

Keywords are used for the combination of lexical transfers and tree transfers:

With non-lexicalized tree transfers, different transfers can be defined between two trees. For example, in the test data, we have two transfers between some trees of the families Tnx0Vnx1, <alpha>nx0Vnx1 for example.

The first one is a subject/subject and object/object transfer ("love/aimer"), and the second is a subject/object and object/subject transfer (AN EXAMPLE ?, there is "miss/manquer", but it is not the same families).

In such cases, it is necessary to make the two transfers different, as a lexical transfer between entries that belongs to this family pair will have to select one of them. This is the purpose of the keywords.

Each family (or tree) transfer will have ONE keyword associated with it (by default it is the null keyword, for easy use), and the a possibly ambiguous lexical transfer will have to specify some keywords to tell the transfer manager which tree transfers has to be used with this lexical transfer.

Algorithm (function find-possible-transfer-structures):

Translation application

The main function for transfer is transfer-sentence, which is defined in the file transfer.lisp.

The translation is done in three successive phases :

Transfer algorithm (file transfer.lisp)

Given a source derivation tree, construct the target derivation lattice from the transfer structures associated with each source derivation node.

function transfer-derivation-tree:

function build-transfer-structures:

        build-transfer-structure-simple-to-tree,
        build-transfer-structure-complex-to-tree,
        build-transfer-structure-simple-to-feature,
        build-transfer-structure-complex-to-feature,

        see file transfer.lisp for more details.

Generation algorithm (file generation.lisp)

This phase is not really a generation phase, the term expansion phase should be more appropriate: expand the derivation lattice to find the implicit individual derivation trees in the target language.

The lattice is in fact an AND/OR graph, and in the current implementation is systematically traversed to find all solutions. From a node in the target lattice:

The algorithm is depth-first, top-down, and retrieves one solution at a time (function generate-target-derivation-trees):

This algorithm is very inefficient, does a lot of redundant computations. It could be optimized by storing some intermediate results. Also, probably a bottom-up, breadth-first algorithm would be more efficient, if all complete individual derivation trees would have to be available at the same time, as such an algorithm would construct all solutions in parallel. The implemented one (with some improvements) could be used if some processing was to be done on each derivation tree, which can be discarded afterwards.

Each time a solution is found (function target-derivation-tree-complete), the individual target derivation tree has to be validated:

function validate-target-derivation-tree:

Morphological generation algorithm (file morpho-gener.lisp)

The goal of the morphological generation is to find the relevant morphological entry (or entries) for each of the derivation nodes, given the features that are requested by the structure of the derivation tree itself and the features that are proposed by transfer from morphological heads in the source language. As dependencies can exist between morphological heads of different nodes, the morphological generation cannot be done independently for each derivation node.

function morphological-generation:

Sentence generation (file sentences.lisp)

After the generation (expansion) phase, we have a set of derivation trees in the target language. Each elementary tree in each derivation tree has a list of the possible words within this derivation. The sentence generation is simply a projection of a derivation tree to a linear sequence of words (heads, terminal leaves of each elementary tree and attachments).

The generation is done by recursively traveling in the derivation tree, starting from the root of the derivation tree (function generate-sentence-internal). For each derivation node, the goal is to place in their relative order all elements that are associated to words: the sons of the derivation node and the heads (including terminals and attachments).

Feature propagation in a derivation tree (file feature-deriv.lisp)

The feature propagation has two purposes. The first is to check the validity of the derivation tree by detecting feature clashes, and the second is to maintain all dependencies between features, which is necessary for morphological generation. We use therefore a destructive unification algorithm, to ensure the global dependencies.

The basic (recursive) function is propagate-features-internal, which operates on a derivation node:


[ Index Page | XTAG Page ]

This page is a modified version of the original written by Gilles Prigent (prigentg@lannion.cnet.fr). The page is currently maintained by Anoop Sarkar (anoop@linc.cis.upenn.edu).