next up previous contents
Next: Comparison with IBM Up: Evaluation and Results Previous: TSNLP

Chunking and Dependencies in XTAG Derivations

We evaluated the XTAG parser for the text chunking task [#!abney91!#]. In particular, we compared NP chunks and verb group (VG) chunks31.1 produced by the XTAG parser with the NP and VG chunks from the Penn Treebank [#!marcus93!#]. The test involved 940 sentences of length 15 words or less from sections 17 to 23 of the Penn Treebank, parsed using the XTAG English grammar. The results are given in Table G.3.
  NP Chunking VG Chunking
Recall 82.15% 74.51%
Precision 83.94% 76.43%
{Text Chunking performance of the XTAG parser


System Training Size Recall Precision
Ramshaw & Marcus Baseline 81.9% 78.2%
Ramshaw & Marcus 200,000 90.7% 90.5%
(without lexical information)      
Ramshaw & Marcus 200,000 92.3% 91.8%
(with lexical information)      
Supertags Baseline 74.0% 58.4%
Supertags 200,000 93.0% 91.8%
Supertags 1,000,000 93.8% 92.5%
{Performance comparison of the transformation based noun chunker and the supertag based noun chunker


As described earlier, the results cannot be directly compared with other results in chunking such as in [#!lance&mitch95!#] since we do not train from the Treebank before testing. However, in earlier work, text chunking was done using a technique called supertagging [#!srini97iwpt!#] (which uses the XTAG English grammar) which can be used to train from the Treebank. The comparative results of text chunking between supertagging and other methods of chunking is shown in Figure G.4.31.2 We also performed experiments to determine the accuracy of the derivation structures produced by XTAG on WSJ text, where the derivation tree produced after parsing XTAG is interpreted as a dependency parse. We took sentences that were 15 words or less from the Penn Treebank [#!marcus93!#]. The sentences were collected from sections 17-23 of the Treebank. 9891 of these sentences were given at least one parse by the XTAG system. Since XTAG typically produces several derivations for each sentence we simply picked a single derivation from the list for this evaluation. Better results might be achieved by ranking the output of the parser using the sort of approach described in [#!srinietal95!#]. There were some striking differences in the dependencies implicit in the Treebank and those given by XTAG derivations. For instance, often a subject NP in the Treebank is linked with the first auxiliary verb in the tree, either a modal or a copular verb, whereas in the XTAG derivation, the same NP will be linked to the main verb. Also XTAG produces some dependencies within an NP, while a large number of words in NPs in the Treebank are directly dependent on the verb. To normalize for these facts, we took the output of the NP and VG chunker described above and accepted as correct any dependencies that were completely contained within a single chunk. For example, for the sentence Borrowed shares on the Amex rose to another record, the XTAG and Treebank chunks are shown below.
XTAG chunks:     
 [Borrowed shares] [on the Amex] [rose] 
    [to another record] 
Treebank chunks: 
 [Borrowed shares on the Amex] [rose] 
    [to another record]
Using these chunks, we can normalize for the fact that in the dependencies produced by XTAG borrowed is dependent on shares (i.e. in the same chunk) while in the Treebank borrowed is directly dependent on the verb rose. That is to say, we are looking at links between chunks, not between words. The dependencies for the sentence are given below.
XTAG dependency    Treebank dependency 
Borrowed::shares   Borrowed::rose 
shares::rose       shares::rose 
on::shares         on::shares 
the::Amex          the::Amex 
Amex::on           Amex::on 
rose::NIL          rose::NIL 
to::rose           to::rose 
another::record    another::record 
record::to         record::to
After this normalization, testing simply consisted of counting how many of the dependency links produced by XTAG matched the Treebank dependency links. Due to some tokenization and subsequent alignment problems we could only test on 835 of the original 9891 parsed sentences. There were a total of 6135 dependency links extracted from the Treebank. The XTAG parses also produced 6135 dependency links for the same sentences. Of the dependencies produced by the XTAG parser, 5165 were correct giving us an accuracy of .
next up previous contents
Next: Comparison with IBM Up: Evaluation and Results Previous: TSNLP
XTAG Project