Assignment 3: Southern Brazilian Portuguese

In this assignment, the task is to create a rule system that maps from orthographical strings in Portuguese (this will be the lexical level) to strings that represent their pronunciation (this will be the surface side). A sample mapping of written "caso" to spoken "kazu" looks like this, with, by convention, the lexical string on top and the surface string on the bottom.

Lexical:caso
Surface:kazu
After studying the facts listed below, write your grammar as an xfst script (named something like portuguese.scr) compile it and test it using the apply down command in xfst.

Standard Portuguese orthography is not always a complete guide to the pronunciation of a word (especially in the case of the letter "x" and the vowels written "o" and "e"). As usual, we will restrict and simplify the data slightly to make the solution manageable as a class exercise. Later we will redo the same example using two-level rules.

The Facts to be Modeled

Testing Portuguese Pronunciation

Write a set of that performs the mappings indicated. As in the kaNpat example, the rules should be organized in a cascade, with the composition operator (.o.) between the rules. Be very careful about ordering your rules correctly; the rules cannot be expressed in exactly the same order as the facts listed just above. Compile the rules using the read regex from utility in and test them using the apply down utility.

You should be able to handle the following examples, entering the lexical (top) string in each case and getting back the surface (bottom) string. (The zeros are not shown here and should not appear in your output.) To facilitate the testing, you can type all the input (upper-side) words into a file, called something like mydata, and tell apply down to read the various input strings from that file.

xfst[1] apply down < mydata
disse peru pedaço livro parte parede sabe cada
Jisi piru pedasu livru parCi pareJi sabi kada
 
simpático verdade casa braço chato vermelho gatinho filhos
simpáCiku verdaJi kaza brasu $atu vermeLu gaCiNu fiLus
 
luz case braços partes paredes me antes ninhos
lus kazi brasus parCis pareJis mi anCis niNus

Be sure to test ALL the examples to make sure that your rules are really working as they should. Modify your rules and re-apply the input words until the grammar is working perfectly.



This excercise is from Beesley & Karttunen Finite-state Technology (forthcoming).
Copyright © 1997 Xerox Corporation.