CIS 639 Finite-State Methods in Natural Language Processing

CIS 639 Finite-State Methods in Natural Language Processing

Spring Semester 1998

Instructor
Lauri Karttunen
IRCS 406, 573-6284, karttunen@central
Office Hours: MW 2-4
Time and Location: MW 9-10:30, Towne 303


Many basic steps in language processing, ranging from tokenization, to phonological and morphological analysis, disambiguation, and shallow parsing, can be performed efficiently by means of finite-state transducers. The course will introduce the students to the theory and technology of compiling such transducers from a lexical data bases, from regular expressions, and by other means.

Arabic Demo

XTAG Lexicon   now available in /pkg/cis639/lex/xtaglex.fst.

Assignments
Assignment 1: Pig Latin Translator
Assignment 2: Spanish Jerigonza Game
Assignment 3: Southern Brazilian Portuguese
Assignment 4: Two-level version of Southern Brazilian Portuguese
Assignment 5: Esperanto nouns
Assignment 6: Esperanto adjectives and nouns
Assignment 7: Esperanto verbs
Assignment 8: Numbers
Assignment 9: Flag diacritics in Esperanto
Assignment 10: Flag diacritics in Arabic
Assignment 11: Incremental Finite-state parsing

Syllabus:

Software for the course

You must be a registered student or an approved auditor to have access to the software. Please add /pkg/cis639/bin to your PATH variable. You can launch the applications but there is no read access to the directory.

Tutorials

Beesley & Karttunen Book (DRAFT)

Finite-State Morphology
A gentle introduction into the art of creating morphological analyzers with Xerox tools. For readers who have had some training in formal linguistics and some previous programming experience but no prior knowledge of regular expressions, automata, sets, relations, or formal language theory. The first chapters are probably too elementary for most students in this course but some of the later sections and the exercises may be useful. This is an unfinished draft. Please do not quote or circulate. (Postscript, 377 pages, 3 MB)

karttunen@cis.upenn.edu
Last modified September 27, 1999