Assignment 8: Numbers

The set decimal numbers is obviously a regular language. The expression
    [ %0 | [1|2|3|4|5|6|7|8|9] [%0|1|2|3|4|5|6|7|8|9]* ]
denotes the infinite set {0,1,2, ... 10, 11, ... 1998, ...}. It excludes expressions with leading zeros such as 01.

The corresponding set of numerals in English and in other languages, {zero, one, two, ... ten, eleven, ... one thousand nine hundred ninety-eight, ...}, is of course also a regular language.

This excercise is about defining the two numeral languages simultaneously as a relation so that each decimal number is linked to its counterpart in a natural language.

The attached script,

/mnt/linc/ftp/pub/cis639/public_html/assign/numbers.xfst

creates a transducer that maps decimal numbers (up to a billion) into English numerals, and vice versa. Run it with the command

    xfst -l numbers.xfst
Here is an example of the kind of output it produces when applied:
    apply down> 1998
    one thousand nine hundred ninety-eight

    apply up> twelve thousand three hundred forty-five
    12345
Your task (by Monday) is to
  1. create a similar transducer for some other language. Pick one that is not as easy as English.
  2. Compose your transducer with the one created by the attached script to make a transducer that translates numeral expressions from English into your language and vice versa.
For example, an English/French transducer should map
    ninety-eight
into
    quatre vingt dix huit
and vice versa. An English/German transducer must flip the tens and the ones:
    acht und neunzig
Here is a script for English:

This excercise will be in Beesley & Karttunen Finite-state Technology (forthcoming).
Copyright © 1998 Xerox Corporation.