CIS 639

Statistical Approaches to Natural Language Processing

Spring 2002

 

Syllabus and Overview

 

Time: Tues-Thurs 1:30-3:00pm

Place: Moore 224 

Instructor: Mitch Marcus

(mitch@linc.cis.upenn.edu)

Office: Moore 461a

Phone: 215-898-2538

 

Prerequisites: CIS 530 - Computational Linguistics

 

 

This course will extend the introduction to Statistical NLP begun as part of CIS530.  It is intended to give participants sufficient background to allow independent reading and understanding of the current research literature and to allow the execution of intermediate-level research projects in Statistical NLP. 

 

The course this year will focus on standard and recent statistical methods applied to three problems in grammatical processing: Part of Speech tagging, NP chunking, and grammatical parsing.  Methods investigated will include Hidden Markov Models, Maximum Entropy, probabilistic CFGs and other generative statistical models, Support Vector Machines, Memory Based Learning, and voting methods  (Brill learning will be reviewed.).

 

The class will interleave three modes:

 

  1. Lectures on the contents of Section III of  Manning & Schütze, Foundations of Statistical Natural Language Processing, 
  2.  Student-led discussions of recent papers on NP Chunking from the group of papers to be found

 

  1. Group discussion of the details of maximum entropy and generative probabilistic models for statistical NLP included in Michael Collin's Ph.D. dissertation and Adwait Ratnaparkhi's Ph.D. dissertation. 

 

Required work will include leading a discussion of selected papers, a final paper or course project, and two or three exercises during the semester.

 

(The syllabus for CIS 639 for Spring 2000 can be found here.)