Assignment 5: Esperanto nouns

A Little History

Esperanto was invented or constructed by Dr. L. L. Zamenhof, a Polish oculist, and was first described in a Russian-language publication in 1887. Another constructed language called Volapük was already enjoying some popularity, but most of its speakers deserted to join the new Esperanto movement, which showed impressive vitality up until World War I, in which an estimated 50 percent of all Esperanto speakers died. The movement has never really recovered, but it continues today among a speaking population estimated at anywhere from one to eight million speakers, the figure being hotly contested. In over a century of activity, most of the world's literary classics have been translated into Esperanto, and it also boasts a considerable original literature.

While Esperanto is often charged with being too strongly based on Romance languages, it has also helped inspire Latine Sine Flexione, Interlingua and several other languages whose inventors believe strongly that Esperanto isn't Romance enough. Despite predictions that non-Indo-Europeans would never accept it, China and Japan are relative Esperanto strongholds. It has been accused of being ``too inflected'' although in formal linguistic terms it is not an inflecting language at all but is almost purely agglutinating. Language construction is definitely one of those activities where you can't please everyone. In any case, whatever anyone thinks of it as a viable human language, Esperanto's simplicity and regularity make a perfect subject for initial exercises in lexc we can model much of the language without any twolc or replace rules at all.

In this exercise, you will use lexc to model a portion of the Esperanto noun system. The facts of the noun sublanguage have been limited to keep the exercise easily manageable. If you know some Esperanto, don't worry too much right now about the gaps and overgeneration---some of them will be fixed in future exercises.

The Facts

The Task

Based on the above simplified facts, write a grammar of Esperanto nouns using lexc. Put your grammar in a file named something like esperanto.lexc. I suggest you start the grammar this way: The multicharacter symbols should appear on the upper side of the lexicon only and be realized as the corresponding suffix on the lower side of the lexicon. For example, to associate the symbol +NSuff with the noun marker o, you need somewhere an entry that contains the pair +NSuff:o.

After creating the source file, invoke lexc with the lexc command from the command line:

At the lexc prompt, enter If the lexc compiler finds errors, it will print suitable messages that you should read carefully. Fix any errors in your source file using your text editor, and recompile the source file until it compiles cleanly.

Testing the Result

The commands random-lex and random-surf give you a general idea of whether the transducer that you have constructed came out as you expected. The commands lookup and lookdown are just like apply up and apply down in xfst but less convenient to use as they expect a single word as input:

The best way to do a systematic test is to use the check-all command. However, in order to do that, you need both a "source" and a "result". To get a result, you need to compose the the compiled source network with some rule transducer(s). Because we are assuming here that there are no morphological alternations in Esperanto, the appropriate transducer is the universal identity relation, which you can compile in xfst or in twolc from the regular expression [ ?* ].

Having compiled and saved the trivial rule transducer, load it with the command read-rules. Then apply it to the source network with the command compile-result. You are now ready to use check-all.

The output from the check-all operation is controlled by three flags: singles, duplicates, and failures. The default setting for singles is OFF. You should toggle it ON:

If the singles switch is OFF, check-all does not print any string on the upper side of the source lexicon that uniquely maps to a single string on the lower side of the result network. Here we do want to see all one-to-one pairs. (If your lexicon is correctly constructed, there are no duplicates or failures.)

Although your lexicon is infinite, don't worry, check-all is clever enough not to go to an infinite loop. Here is what you should see:

Because check-all is determined not to go to a loop, it does not follow arcs that lead to a state that is already on the current path. In the case at hand, we do not see any of the optional in, eg, and et suffixes in the check-alloutput. You have to verify them "manually":

Now go back to the lexc source file and add the following new noun roots, with appropriate continuation classes. Recompile and test again.

This excercise is from Beesley & Karttunen Finite-state Technology (forthcoming).
Copyright © 1997 Xerox Corporation.