CIS Research Seminar
Thursday, November 5
3 PM
Wu & Chen
Speaker: Annie Louis
Automatic summarization systems must select the most important information from a given input. In our work, we aim to identify measurable indicators of the quality of content included in a summary. What is a good summary? Why is system performance poor? I will describe two properties related to the input text that are predictive of human judgements of summary quality and their implications for system development.
Most of the current systems use the same method to summarize any given input. We show that this approach is very inefficient leading to variable performance from the same systems on different inputs. We present analyses of data from large scale summarization system evaluations which show that current systems find some inputs more difficult than others. For example, the average system performance on summarizing opinions is very low compared to focused input types such as descriptions of an event. Specialized methods for different inputs are necessary and it is also desirable to equip systems with knowledge about expected performance on an input. As a first step in this direction, we demonstrate some characteristics of inputs that influence summary quality and show that difficult inputs can be identified with accuracies significantly above the baseline.
Our second experiment focuses on a system feature to identify summary-worthy content from a document. Since summaries are expected to be surrogates of the input, similarity with the input is very likely to be a good objective function to optimize. But there are several ways to measure similarity. We present an analysis of different input-summary similarity metrics and examine which similarity scores are highly predictive of human scores for summary quality. Information-theoretic measures of similarity turn out best and obtain the highest correlations with human judgements.
This is joint work with Ani Nenkova.