Assignment 2: Spanish Jerigonza game

Write an xfst script that performs the following task.

  1. Construct a transducer for the Spanish language game called "Jerigonza" (jargon).
    The rules of the game are explained below.
  2. Apply the Jerigonza transducer to a list of Spanish words read in from a file. You can use the file
    /mnt/linc/ftp/pub/cis639/public_html/assign/spanish-words.txt.
  3. Write the resulting Jerigonza words to the console or into another file.

Like Pig Latin, Jerigonza is a very simple game:

To convert a Spanish word into Jerigonza, add pV after each vowel
where
V is a copy of the vowel in question.

For example, the word "pero" is "peperopo" in Jerigonza.

There are a few complications to this simple general rule.

  1. In addition to a, e, i, o, u, Spanish has accented vowels. All five vowels can have the acute accent: á, é, í, ó, ú. The acute accent marks a stress that is not predictable by the general Spanish stress rule. Some of the vowels may also occur with the dieresis accent: ä, ï, ü. In Jerigonza, the copy of an accented vowel does not have the accent.
  2. The letter u is not pronounced and does not count as a vowel when it is preceded by q or when it is preceded by g and followed by e or i. Thus there is only one vowel in the word "que" and three in "águila" and "liguero". The corresponding Jerigonza words are "quepe", "ápaguipilapa", "lipigueperopo".
  3. The vowels i and u are pronounced as glides in certain contexts where they do not count as vowels. In particular,
    1. The letter i is a glide if it is preceded any vowel (accented or unaccented) other than i or u or if it is followed by a vowel. For example, the i in "baile" and "glosario" is not duplicated in Jerigonza. These words are rendered as "bapailepe" and "gloposapariopo".
    2. The letter u is a glide if it is preceded any vowel (accented or unaccented) other than i or u or if it is followed by a vowel. For example, the u in "audaz" and "ueste" is not duplicated. They become "apaudapaz" and "uepestepe".
    3. When i and u occur together, the first one is a glide. For example, "buir" and "piurano" correspond to "buipir" and "piupurapanopo" in Jerigonza.

Example

Here is an example of how things should work when you are done.

% xfst -f jerigonza.scr Launching xfst with the script. Printing progress messages.
>> Defining 'Vowel' and 'Cons'
2 states, 12 arcs, 12 paths.
2 states, 22 arcs, 22 paths.
>> Compiling the 'Jerigonza' transducer
20 states, 146 arcs, Circular.
>> Reading a test file
Opening file /mnt/linc/ftp/pub/cis639/public_html/assign/spanish-words.txt...
Closing file /mnt/linc/ftp/pub/cis639/public_html/assign/spanish-words.txt...
61 states, 72 arcs, 13 paths.
>> Applying ...
111 states, 122 arcs, 13 paths.

>> Here are the original words
ueste
irgues
que
guardia
guerra
glosario
águila
automóvil
audaz
piurano
situación
liguero
después
buir

>> Here are the Jerigonza words
uepestepe
ipirguepes
quepe
guapardiapa
gueperrapa
gloposapariopo
ápaguipilapa
apautopomópovipil
apaudapaz
piupurapanopo
sipituapaciópon
lipigueperopo
depespuépes
buipir

bye.
%

Hints

Here are the initial definitions
echo >> Defining 'Vowel' and 'Cons'

define Vowel [ a | e | i | o | u | á | é | í | ó | ú | ï | ü ] ;
define Cons  [ b | c | d | f | g | h | j | k | l | m | n | ñ |
               p | q | r | s | t | v | w | x | y | z ] ;

Typing accented letters in Emacs is cumbersome if you don't have your own defintions for them. It takes four keys in Emacs to make á. You have to type C-x 8 ' a .

Because the accents on the vowels are not copied, the same inserted syllable, e.g. pe, may represent either e or é. Be careful not to let the copied vowel to be copied a second time. For instance, "sé" must translate to "sépe", not to "sépepe".

The treatment of a, e, o and their accented variants is straight-forward. The high vowels i and u are more difficult to handle correctly. The best way to proceed is to first ignore the special properties of i and u. Once you have a transducer for the general case, it is easier to see how to handle the high vowels.

In the final stage, you need to add constraints that prevent i and u from being copied in contexts where they get realized as glides and for contexts where u is not pronounced at all. When you get i right, there is not much more to be done for u.

The size of the Jerigonza transducer is about 16 states, 112 arcs.


Thanks to Mark Liberman for this exercise.