Assignment 1: Pig Latin translator
Using the xfst tool, create a transducer for
translating from English (or any other language) into Pig Latin.
There may be different varieties of Pig Latin. Let's pick the simplest
one, defined by the rule
If the word begins with a consonant, the corresponding Pig Latin
word
is the same except that the initial consonant has been moved to the
end
of the word and suffixed with "ay".
For example, the sentence "pig latin is fun" corresponds to "igpay
atinlay is unfay" in Pig Latin.
Your task is to write an xfst script, to be run with the command,
that leaves on the stack a transducer with the following properties:
- The upper side language of the transducer is the universal language.
That is, it contains all strings of any length.
- The transducer is unambiguous in the top-down direction. That is,
each string in the upper language corresponds to one and only one string
in the lower language.
- When you apply your transducer "downward"
to an input line, xfst displays the corresponding Pig Latin
translation for all words that begin with a consonant. Otherwise the
output is the same as the input.
- Words consist of consonants and vowels.
- The consonants are b, c, d, f, g, h, j, k, l, m, n, p, q, r, s, t,
v, w, x, y, z.
- The vowels are a, e, i, o, u.
- White space characters, space, and tab, are word separators. Thus
the string "pig latin" contains two words.
Example
Here is an example of how things should work when you are done.
| xfst -l piglatin.scr |
Execute piglatin.scr and wait for commands. |
| xfst[1]: apply down |
Start applying the transducer downwards. |
apply down> pig
igpay
apply down> pig latin
igpay atinlay
apply down> the little brown fox jumped over the lazy dog
hetay ittlelay rownbay oxfay umpedjay over hetay azylay ogday |
Hints
You will find it convenient to start your script with some definitions,
such as
define Cons [ b ] ;
define Vowel [a | e | i | o | u ] ;
define Ltr [ Cons | Vowel ]+ ;
define Limit [" " | "\t" | .#. ] ; |
Because the trick in the Pig Latin transformation is the treatment of
word-initial consonants, it is better to start experimenting with just one
consonant. Once you succeed in constructing a transducer that correctly maps
"boa" into "oabay", the rest is easy.
Because you cannot literally "move" the initial consonant from the
beginning to the end of the word, you need to think of movement in
terms of two more primitive operations:
- Copy the initial consonant to the end of the word adding "ay".
- Delete the initial consonant.
You need the definition of Limit to refer to the beginning
and end of the word.
One more important hint:
| [ ] -> a || b _ c ;
| Maps the string "bc" to the infinite language [ b a* c ]. |
| [. .] -> a || b _ c ; |
Maps the string "bc" to the string "bac". |
To insert just one instance of something, use [. .] in your
replace expression.
Work with simple replacement and composition. Don't
try using parallel replacement.
With just one consonant, your Pig Latin translator
should have around 8 states and 44 arcs. With the full set of 21 consonants,
the size of the transducer is about 48 states, 1220 arcs.
Thanks to Bonnie Webber and Mark Steedman for suggesting this
excercise.