The Penn Treebank Project Release 2 CDROM features the new Penn Treebank II bracketing style, which is designed to allow the extraction of simple predicate/argument structure. Over one million words of text are provided with this bracketing applied, along with a complete style manual explaining the bracketing, and new versions of tools for searching and treating bracketed data. This CDROM also contains all the annotated text material from the earlier Treebank Preliminary Release, including the Brown Corpus. While these materials have not all been converted to the newer bracketing style, they have been cleaned up to remove problems that had appeared in the earlier release. The contents of Treebank Release 2 are as follows: Updated information on this release will be issued periodically.

Treebank 2 bracketing style is described in The Penn Treebank: Annotating predicate argument structure.

Treebank I style (used on the Version 0.5 CDROM) is described in Building a large annotated corpus of English: The Penn Treebank

Detailed questions about the corpus may be sent to treebank@unagi.cis.upenn.edu, while questions and requests for obtaining Treebank Release 2 should be sent to Linguistic Data Consortium.