Assignment 4: Two-level Account of Southern Brazilian Portuguese

In this assignment, the task is to create a two-level grammar that maps from orthographical strings in Portuguese (this will be the lexical level) to strings that represent their pronunciation (this will be the surface side). A sample mapping of written "caso" to spoken "kazu" looks like this, with, by convention, the lexical string on top and the surface string on the bottom.

Lexical:caso
Surface:kazu
After studying the facts listed below, write your two-level grammar (named something like portuguese.rul) compile it and test it using the twolc compiler. Use the commands read-grammar, compile, and lex-test to read, compile, and test your rules.

Standard Portuguese orthography is not always a complete guide to the pronunciation of a word (especially in the case of the letter "x" and the vowels written "o" and "e"). As usual, we will restrict and simplify the data slightly to make the solution manageable as a class exercise. Later we will redo the same example using two-level rules.

The Facts to be Modeled

Testing Portuguese Pronunciation

Note that each two-level rule is a constraint that must be satisfied independently from, and simultaneously with, all the other two-level rules. You can use the lexical side or the surface side or some combination thereof to restrict the context but there is no rule ordering.

Use the command lex-test to test individual words. Because of the accented characters, it is best to run test in an Emacs buffer. To create one, use the command M-x shell.

twolc> lex-test
Lexical string ('q' = quit): braço
brasu
b
r
a
ç:s
o:u

To facilitate the testing, you can type all the input (upper-side) words into a file, called something like portuguese.words, and use the command lex-test-file to read the various input strings from that file. The command will prompt you to enter the name of the input and the output file.

twolc> lex-test-file portuguese.words
Output file (- = stdout) [cancel]): portuguese.out
........................
twolc>

The file /mnt/linc/ftp/pub/cis639/assign/portuguese.words contains the following test words. Be sure that you handle them all correctly.

disse peru pedaço livro parte parede sabe cada
simpático verdade casa braço chato vermelho gatinho filhos
luz case braços partes paredes me antes ninhos

Be sure to test ALL the examples to make sure that your rules are really working as they should. Modify your rules and re-apply the input words until the grammar is working perfectly.



This excercise is from Beesley & Karttunen Finite-state Technology (forthcoming).
Copyright © 1997 Xerox Corporation.