[xtag-meeting]: Talk on active learning/statistical parsing

Rebecca Hwa from the Univ. of Maryland will be speaking at next week's
XTAG meeting (24 Jan, 10:30 AM, IRCS fishbowl).

Sample selection for parser induction

Many corpus-based natural language processing systems rely on using
large quantities of annotated text as their training examples. Building
this kind of resource is an expensive and labor-intensive project.
Sample selection is a machine learning technique that attempts to
minimize the number of training examples by asking people to annotate
only those examples with the greatest potential to improve the system.
In this talk, I will address the challenges in applying sample selection
to training parsers: what is an effective metric for
selecting informative examples to train parsers; does the metric work
for different kinds of parsers; and are the selected examples good for
training other parsers?  I will present empirical results showing that
selection using the {\it tree-entropy} metric can significantly decrease
the number of training examples needed for both a history-based and an
EM-based parser.