Ani Nenkova
Computer Science Department
University of Pennsylvania
"Modeling text quality in newspaper text and machine translation"
Abstract
What are the characteristics of well written text? People have strong intuitions about this, but rarely can give a precise answer. General computational models of text quality don't exist either, even though they are a critical component for a range of text producing applications such as summarization, machine translation and text generation.
The goal of our work is to develop a model of text quality for use in language applications. For newspaper text, we combine lexical, syntactic, and discourse features to produce a highly predictive model of human readers' judgments of text readability. This is the first study to take into account such a variety of linguistic factors and the first to empirically demonstrate that discourse relations are strongly associated with the perceived quality of text. We show that various surface metrics generally expected to be related to readability are not very good predictors of readability judgments in our Wall Street Journal corpus. We also establish that readability predictors behave differently depending on the task: predicting text readability or pairwise comparison of readability. Our experiments indicate that discourse relations are the one class of features that exhibits robustness across these two tasks.
In the context of machine translation, we study sentence fluency, which is an important component of overall text readability. We report the results of an initial study into the predictive power of surface syntactic statistics and language model features to predict fluency originally assessed for the purpose of evaluating machine translation. We find that these features are weakly but significantly correlated with readability. Machine and human translation can be distinguished with accuracy over 80% and performance on pairwise comparison of fluency is also very high, over 90%.
Tuesday, November 4, 2008
3:00 - 4:15
Wu & Chen
101 Levine Hall