Assignment 10: Flag diacritics in Arabic

Arabic nouns like kitaab ("book") can usually take any one of six case endings:
        kitaabu        definite nominative
        kitaaba        definite accusative
        kitaabi        definite genitive

        kitaabun       indefinite nominative
        kitaaban       indefinite accusative
        kitaabin       indefinite genitive
But if the overt definite article al appears attached to the front of the noun, which is morphologically and orthographically possible, then only the three definite case endings are possible. Looked at from the negative point of view, the overt definite article prefix al is incompatible with the indefinite case suffixes.

Arabic nouns may also have a bi prefix, with a prepositional meaning, that attaches to the beginning of a noun and is compatible only with genitive case endings. So bikitaabi (``for (the) book'', definite genitive) and bikitaabin (``for (a) book'', indefinite genitive) are well-formed, but bi is incompatible with nominative and genitive case endings as in *bikitaabu and *bikitaaban.

When the bi and al occur together, as in bialkitaabi, only the definite genitive case ending is possible.

The Task

Consider the following microlexicon for Arabic in the lexc formalism that does not yet enforce the aforementioned constraints:
  Multichar_Symbols
     @U.Type.Def@  @U.Type.Indef@
     @U.Case.Nom@  @U.Case.Acc@ @U.Case.Gen@
     +Prep+ Det+ +Indef +Def +Nom +Acc +Gen

    LEXICON Root
      Prep;
      DefPrefix;
      Nouns;

    LEXICON Prep
      PrepPrefix;

    LEXICON PrepPrefix
      bi+Prep+:bi Det;
      bi+Prep+:bi Nouns;

    LEXICON Det
      DefPrefix;
    
    LEXICON DefPrefix
      Det+:al Nouns;

    LEXICON Nouns
      kitaab Case;

    LEXICON Case
      DefCase;
      IndefCase;

    LEXICON DefCase
      +Nom+Def:u #;
      +Acc+Def:a #;
      +Gen+Def:i #;

    LEXICON IndefCase
      +Nom+Indef:un # ;
      +Acc+Indef:an #;
      +Gen+Indef:in #;
Your task is to augment this lexicon with flag diacritics that allow al only with the definite case endings and bi only with genitive.

Hints

This task is not quite as trivial in practice as it may appear conceptually because you need to ensure that the flag diacritics are present on the upper as well as the lower side of the network.

One stupid technical difficulty is that if you opt for lexc entries that have the form of a regular expression, then + and @ need to be prefixed with an escape character:

   < %@U.Case.Nom%@ %+Nom:u >
The simplest way to check your solution is to use the compile source command in lexc to construct the lexicon. Then use source-to-result to move the network to the slot for the "result" network (usually created by composing with rules) and then recompile the source again to have both "result" and "source" even though they are identical in this case.

If you now set the toggle on the singles flag, check-all should display the following mapping:

    kitaab+Acc+Def               kitaaba
    kitaab+Gen+Def               kitaabi
    kitaab+Nom+Def               kitaabu
    kitaab+Acc+Indef             kitaaban
    kitaab+Gen+Indef             kitaabin
    kitaab+Nom+Indef             kitaabun
    bi+Prep+kitaab+Gen+Def       bikitaabi
    bi+Prep+kitaab+Gen+Indef     bikitaabin
    bi+Prep+Det+kitaab+Gen+Def   bialkitaabi
    Det+kitaab+Acc+Def           alkitaaba
    Det+kitaab+Gen+Def           alkitaabi
    Det+kitaab+Nom+Def           alkitaabu
When the obey-flags switch is ON (default), check-all does not display the diacritic symbols but enforces the constraints. If you toggle the switch OFF, the flags are treated as ordinary symbols; that is, they are displayed and no constraints are enforced. In the case at hand, the constraints rule out half of the paths in the network.

Compiling the constraints into the network with the eliminate flag command in xfst, nearly doubles the size of the transducer.


Copyright © 1998 Xerox Corporation.