LTAG-spinal: Treebank and parsers

A new resource for incremental, dependency and semantic parsing

By Libin Shen, Lucas Champollion, Aravind K. Joshi and Prashanth Mannem

Department of Computer and Information Science
University of Pennsylvania
Penn logo
XTAG
Screenshot

LTAG-spinal is a novel variant of traditional Lexicalized Tree Adjoining Grammar (LTAG) with desirable linguistic, computational and statistical properties. Unlike in traditional LTAG, subcategorization frames and the argument-adjunct distinction are left underspecified in LTAG-spinal. With adjunction constraints, this formalism is weakly equivalent to LTAG. LTAG-spinal provides a desirable resource for statistical LTAG parsing, incremental parsing, dependency parsing, and semantic parsing.

This page is the companion website to the following dissertation:

Statistical LTAG Parsing. Libin Shen (2006). Ph.D. thesis. PDF.

For more recent information, please refer to the papers linked in the following.

Jump directly down to:
  1. Treebank
  2. Parsers
  3. New! Java API
  4. Contact us

LTAG-spinal Treebank

We extracted an LTAG-spinal Treebank from the Penn Treebank and harmonized it with the PropBank. Based on Propbank annotation, we successfully extracted predicate coordination and LTAG adjunction structures. The LTAG-spinal Treebank makes explicit semantic relations that are implicit or absent from the original Penn Treebank.

LTAG-spinal Parsers

We have used this treebank to train two novel statistical incremental parsers, a left-to-right parser that produces full LTAG-spinal annotation, and a bidirectional parser that produces derivation trees without spines (similarly to a dependency parser). Both achieve competitive results on our treebank, with the latter significantly improving over the former. As far as we know, these parsers are the first comprehensive attempt of efficient statistical parsing with a formal grammar with provably stronger generative power than CFG.
We have also developed a POS tagger using the bidirectional search strategy. The output of this POS tagger can be used as the input to the parsers after a simple tag mapping. (The POS tagger is trained on the CoNLL standard data set, so that we need to map ( to LRB and ) to RRB to make it compatible with the Penn Treebank and LTAG-spinal treebank annotation.)

New! Java API

We have developed a comprehensive API in the Java programming language (compatible with Java 1.4 or higher). The API provides full read access to the data structures of the LTAG-spinal treebank, the modified version of the Propbank, as well as the output of the two parsers. The API is licensed under GNU GPL v.3. Please contact us if this license does not meet your particular needs.


Browse the API documentation online
This documentation is linked to the source code. Click on the method names to view it.

Acknowledgements

We are grateful to Ryan Gabbard, who has contributed to the code for the LTAG-spinal API. We thank Martha Palmer for generously providing the Propbank API (originally written by Scott Cotton) for us to include in our treebank API. We also thank Julia Hockenmaier, Mark Johnson, Yudong Liu, Mitch Marcus, Sameer Pradhan, Anoop Sarkar, and the CLRG and XTAG groups at Penn for helpful discussions.

Contact us

For all inquiries, feel free to contact Lucas Champollion:


[ XTAG Main Page ]
Page maintained by Lucas Champollion
Last modified: 12/23/2007

Locations of visitors to this page