Sentiment analysis seeks to identify the viewpoint(s) underlying a text span. Potential applications include question-answering systems that address opinions as opposed to facts and business intelligence systems that analyze user feedback. The research issues raised by such applications are often quite challenging compared to fact-based analysis. These challenges, together with large amounts of opinion-oriented text available through the web, make this area an excellent testing ground for interesting ideas in natural language processing, as well as machine learning, data mining, and theory. In this talk, we illustrate the new challenges and opportunities with two sentiment analysis tasks. In particular, we will describe how we modeled different types of relations, which can have implications outside this area as well.
One task that has attracted a great deal of attention is polarity classification, e.g., classifying a movie review as “thumbs up” or “thumbs down” from textual information alone. We considered a number of approaches, including one that applies text categorization techniques to just the subjective portions of the document. Extracting these portions can be a hard problem itself; we describe an approach based on efficient techniques for finding minimum cuts in graphs that incorporate sentence-level relations. Another task, which can be viewed as a non-standard multi-class classification task, is the rating-inference problem, where one must determine the reviewer's evaluation with respect to a multi-point scale (e.g. one to five "stars"). We apply a meta-algorithm, based on a metric labeling formulation of the problem, that explicitly exploits relations between classes. We show that the meta-algorithm can provide significant improvements over both multi-class and regression versions of SVMs when we employ a novel similarity measure appropriate to the problem.
Portions of this work are joint with Lillian Lee and Shivakumar Vaithyanathan
Thursday, April 20 , 2006
Wu & chen Auditorium
101 Levine Hall
3:00pm - 4:15 pm