|
|
COURSE
INFORMATION
|
|
| Instructor |
Ani Nenkova, nenkova@seas.upenn.edu |
| Time |
TT 4:30-6pm |
| Location |
Moore 212 |
| Description |
Automatic summarization as part of information retrieval systems can help alleviate the information overload problem caused by the unprecedented amount of online textual information. The building of a summarization system requires good understanding of the properties of human language and the use of various natural language tools. In this course we will build several summarization systems of increasing complexity and sophistication. In the process we will learn about various natural language processing tools and resources such as part of speech tagging, chunking, parsing, Wordnet, and machine learning toolkits, and will overview the fundamentals of information retieval systems. We will also cover probability and statistics concepts used in summarization, but also applicable to a wide range of other language-related and information retrieval tasks. Topics to be covered include: |
| Textbook |
There is no required textbook for the class. However, here are two texts that you will find useful and interesting if you decide you want to further persue some of the topics. |
| Grading |
|
|
Date
|
Topic and Readings |
| Sep 4 | Very brief class intro; no real class because of conflict with CIS400 |
| Sep 9 | Course overview Introduction to summarization and language applications .ppt |
| Sep 11 | Vocabulary size and term distribution: tokenization, text
normalization, stemming Reading: Chapter 23 from J-M textbook, Question Answering and Summarization .ppt |
| Sep 16 | Term weighting and vector represenation of text .ppt |
| Sep 18 | Language models; Evaluation in information retrieval Reading: Summarizing Scientific Articles: Experiments with Relevance and Rhetorical Status .ppt |
| Sep 23 | Approaches to automatic summarization .ppt |
| Sep 25 | Probability theory; Bayes Theoren and Naive Bayes classification Reading: How to Write a Spelling Corrector HW1 out .ppt |
| Sep 29 | Text categorization and feature selection; chi square test .ppt |
| Oct 2 | Homework discussion; Exercises on text representations, calculating
probabilities, similarities, etc; Discussion of readings Reading: (1) A trainable document summarizer (2) Language Identification: Examining the Issues |
| Oct 6 | Measures of association: chi square test, mutual information, binomial distribution and log likelihood ratio .ppt |
| Oct 9 | Part of speech tagging HW1 due Readings: (1) Chapter 5, M&S (2) Experiments in multi-document summarization .ppt |
| Oct 14 | No class; fall break |
| Oct 16 | Homework discussion; Log likelihood ratio test and topic signature
words HW2 Part1 out Readings: (1) Topic-Focused Multi-document Summarization Using an Approximate Oracle Score (2) The Automated Acquisition of Topic Signatures for Text Summarization .ppt |
| Oct 21 | Introduction to WordNet .ppt |
| Oct 23 | Word sense disambiguation and word similarity Reading: Automatic record reviews .ppt |
| Oct 28 | Word sense disambiguation |
| Oct 30 | Discussion of assigned readings .ppt |
| Nov 4 | Lexical chains for summarization Readings: Using lexical chains for text summarization Efficiently computed lexical chains as an intermediate representation for automatic text summarization |
| Nov 6 | Web search .ppt HW3 and take home midterm distributed |
| Nov 11 | Readings: Entropy of search logs A taxonomy of web search |
| Nov 14 | Discourse, coheremce and anaphora resolution .ppt |
| Nov 18 | Evaluation in summarization; Summarization beyond extraction .ppt Reading: Summarization Evaluation for Text and Speech |
| Nov 20 | Discussion of midterm; writing |
| Nov 25 | Randomized tests for statistical significance |
| Dec 2 | Predicting input difficulty; Evaluation without human models .ppt Reading: Identifying correlates of input difficulty for generic multi-document summarization |
| Dec 4 | Final review Reading: Opinion mining and sentiment analysis |