Next: Chunking and Dependencies in Up: Evaluation and Results Previous: Parsing Corpora

TSNLP

In addition to corpus-based evaluation, we have also run the English Grammar on the Test Suites for Natural Language Processing (TSNLP) English corpus [#!Lehmann96!#]. The corpus is intended to be a systematic collection of English grammatical phenomena, including complementation, agreement, modification, diathesis, modality, tense and aspect, sentence and clause types, coordination, and negation. It contains 1409 grammatical sentences and phrases and 3036 ungrammatical ones.

Table: Breakdown of TSNLP Errors

Error Class	%	Example
POS Tag	19.7%	She adds to/V it , He noises/N him abroad
Missing lex item	43.3%	used as an auxiliary V, calm NP down
Missing tree	21.2%	should've, bet NP NP S, regard NP as Adj
Feature clashes	3%	My every firm, All money
Rest	12.8%	approx, e.g.

There were 42 examples which we judged ungrammatical, and removed from the test corpus. These were sentences with conjoined subject pronouns, where one or both were accusative, e.g. Her and him succeed. Overall, we parsed 61.4% of the 1367 remaining sentences and phrases. The errors were of various types, broken down in Table

. As with the error analysis described above, we used this information to help direct our grammar development efforts. It also highlighted the fact that our grammar is heavily slanted toward American English--our grammar did not handle dare or need as auxiliary verbs, and there were a number of very British particle constructions, e.g. She misses him out. One general problem with the test-suite is that it uses a very restricted lexicon, and if there is one problematic lexical item it is likely to appear a large number of times and cause a disproportionate amount of grief. Used to appears 33 times and we got all 33 wrong. However, it must be noted that the XTAG grammar has analyses for syntactic phenomena that were not represented in the TSNLP test suite such as sentential subjects and subordinating clauses among others. This effort was, therefore, useful in highlighting some deficiencies in our grammar, but did not provide the same sort of general evaluation as parsing corpus data.

Next: Chunking and Dependencies in Up: Evaluation and Results Previous: Parsing Corpora

XTAG Project
1998-09-14