General Linguistic Tools:
Empirical Methods for Multilingual Processing 'Onoring Words, Enabling Rapid Ramp-Up (EMPOWER2)


PIs:

Aravind Joshi Anthony Kroch Mark Liberman
Mitch Marcus Martha Palmer Lyle Ungar

Project Personnel

Inspired by recent advances in the understanding of lexicon organization, our objective is to improve our machine learning techniques for automatically producing information processing tools from annotated corpora for any language. Tools currently being quickly produced include part-of-speech taggers, parsers, lexicons, and noisy bi-lingual lexicons. Our goal is three-fold:
  1. to improve the accuracy of the current tools
  2. to develop new techniques and richer annotations that will allow the automatic production of more powerful tools such as semantic interpreters and transfer lexicons
  3. to use the new techniques to in turn speed up the annotation process, which requires the majority of the time and labor being invested.
We are engaged in work on the following projects:

Parsing and Treebanks

Chinese TreeBank: A corpus of Chinese text segemented into words and annotated with part-of-speech labels and syntactic bracketing, modeled on the English TreeBank. Annotation guidelines have been developed based on the input of a community of influential researchers.

Lexical Semantics and Sense Tagging

VerbNet: An enrichment of verb entries in WordNet that includes more specific syntactic information and verb class membership. It draws heavily on the verb-class taxonomy of Beth Levin, and on an examination of the predicate-argument structures found to be associated with verbs in a corpus.

Senseval: A project designed to enable the quantitative comparison of different approaches to automated word-sense disambiguation, in English and other languages. To help prepare the data used in the evaluation, sense tagging tools are being developed. This exercise also provides an opportunity to assess the validity of our lexicon's sense inventory and to associate English word senses with those from the other evaluation languages.


Applications: Question Answering and Machine Translation

Korean/English Machine Translation: An application of the dependency structure approach, as reflected in both Tree-Adjoining Grammars and Meaning Text Theory, to domain-specific machine translation, in collaboration with CoGenTex and Systran.



Meetings/Reports about TIDES

The TIDES Kickoff Meeting was July, 2000
(You may need to "Reload" this page if it doesn't display properly the first time.)

The TIDES PI Meeting was October, 2000
(You may need to "Reload" this page if it doesn't display properly the first time.)

Site Visit Agenda, February 6, 2001

Natural Language Processing Research Presentation, September 21, 2001

Site Visit Agenda, March 11, 2002