General Linguistic Tools:
Empirical Methods for Multilingual Processing 'Onoring Words, Enabling Rapid Ramp-Up (EMPOWER2)
PIs:
Project Personnel
Inspired by recent advances in the
understanding of lexicon organization, our
objective is to improve our machine learning
techniques for automatically producing
information processing tools from annotated
corpora for any language. Tools currently
being quickly produced include part-of-speech
taggers, parsers, lexicons, and noisy bi-lingual
lexicons. Our goal is three-fold:
- to improve the accuracy of the current
tools
- to develop new techniques and richer
annotations that will allow the automatic
production of more powerful tools such
as semantic interpreters and transfer
lexicons
- to use the new techniques to in turn
speed up the annotation process, which
requires the majority of the time and
labor being invested.
We are engaged in work on the following
projects:
|
Parsing and Treebanks
Chinese
TreeBank:
A corpus of Chinese text segemented
into words and annotated with part-of-speech labels and syntactic bracketing,
modeled on the English TreeBank.
Annotation guidelines have been developed based on the input of a community
of influential researchers.
- Abeille A., L. Clement, and A. Kinyon (2000)
- Abeille A., L. Clement, and A. Kinyon (2001) (not downloadable)
- Bikel, Daniel and David Chiang (2000) (.pdf version)
- Chiang, David (2000) (.pdf version)
- Kinyon, A. (2000a)
- Kinyon, A. (2000b)
- Sarkar, Anoop (2001) (Draft version -- not final copy. Please do not cite.)
- Sarkar, Anoop, Fei Xia, and Aravind Joshi (2000)
- Sarkar, Anoop and Daniel Zeman (2000)
- Xia, Fei, Chung-hye Han, Martha Palmer, and Aravind Joshi (2000)
- Xia, Fei and Martha Palmer (2000)
- Xia, Fei, Martha Palmer, and Aravind Joshi (2000)
- Xia, Fei, Martha Palmer, Nianwen Xue, Mary Ellen Okurowski, John Kovarik, Fu-Dong Chiou, Shizhe Huang, Tony Kroch, and Mitch Marcus (2000)
Lexical Semantics and Sense Tagging
VerbNet: An
enrichment of verb entries in WordNet that includes more
specific syntactic information and verb class membership. It draws heavily
on the verb-class taxonomy of Beth Levin, and on an examination of the predicate-argument
structures found to be associated with verbs in a corpus.
Senseval: A
project designed to enable the quantitative comparison of different
approaches to automated word-sense disambiguation, in English and other
languages. To help prepare the data used in the evaluation, sense
tagging tools are being developed. This exercise also provides an
opportunity to assess the validity of our lexicon's sense inventory and to
associate English word senses with those from the other evaluation
languages.
- Dang, Hoa Trang, Karin Kipper, Martha Palmer (2000)
- Kipper, Karin, Hoa Trang Dang, Martha Palmer (2000)
- Kipper, Karin, Hoa Trang Dang, William Schuler, Martha Palmer (2000)
- Kipper, Karin and Martha Palmer (2000)
- Palmer, Martha, Hoa Trang Dang, Joseph Rosenzweig (2000)
Applications: Question Answering and Machine Translation
Korean/English
Machine Translation:
An application of the dependency
structure approach, as reflected in both Tree-Adjoining
Grammars and Meaning Text Theory, to domain-specific machine translation,
in collaboration with CoGenTex
and Systran.
Meetings/Reports about TIDES
The TIDES Kickoff Meeting was July, 2000
(You may need to "Reload" this page if it doesn't display properly the first time.)
The TIDES PI Meeting was October, 2000
(You may need to "Reload" this page if it doesn't display properly the first time.)
Site Visit Agenda, February 6, 2001
Natural Language Processing Research Presentation, September 21, 2001