CIS 430/530 Fall 2011

Computational Linguistics

Instructor
Ani Nenkova 
Office: Levine 505
nenkova (AT) cis.upenn.edu 
Office Hours: Wed, 4:30-5:30; Levine 505 
Teaching Assistant
Alexander Shoulson
Office: Moore 103 (SIG Lab)
shoulson (AT) seas.upenn.edu 
Office Hours: Thu, 4pm--5pm; Levine 612  
 

Class Schedule:

Monday & Wednesday, 3:00PM to 4:30PM, Towne 313

Course Administrator:

Brittany Binler, 311 Levine, melema (AT) cis.upenn.edu

Course description:

This is an introductory course to computational linguistics, centered on the fundamental questions of how a machine can learn to analyze, understand and produce language. The topics covered include speech synthesis and recognition, syntactic parsing, semantic interpretation, discourse and pragmatic inference, and sentiment analysis. Students will get familiar with standard practical tools and resources for automatic linguistic analysis.

Prerequisites:

Undergraduate students should have completed CIS 121 before enrolling.

Textbooks:
  • [REQUIRED] Steven Bird, Ewan Klein, and Edward Loper, Natural Language Processing with Python --- Analyzing Text with the Natural Language Toolkit, O'Reilly Media, 2009. (Free Online)

  • [OPTIONAL] Daniel Jurafsky and James H. Martin Speech and language processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, 2nd edition, Pearson Prentice Hall, 2008. (Available on Amazon)

  • [OPTIONAL] Chris Manning and Hinrich Shutze, Foundations of Statistical Natural Language Processing, MIT Press, 1999. (Free Online)

  • Various supplementary readings.

Grading:
  • 4 Homework Assignments: 60%

    • Each homework will involve implementation, design decisions and empirical evaluation.

      1. Implementation related to a specific language task. All assignments will be done in Python.

      2. Approach selection: In each homework assignment there will be a problem where a task is given, but details of how to solve the task will not be specified. Students will use knowledge acquired in lectures and general reasoning to make choices and explain the motivation behind these choices.

      3. Result analysis: Each homework will involve an empirical evaluation which will allow us to precisely quantify how well our systems work, as well as to compare the performance of different systems. We will do error analysis in order to identify where a system fails and point to possible ways to improve the system.

    • NOTE: Each homework assignment will consist of 5 sub-problems. One of these will be different for students registered for CIS 430 and CIS 530. CIS 530 students will emphasize implementation, while those in CIS 430 will focus on analysis of system output. Over the four homework assignments this will amount to 10% difference in requirements.

  • Class Project: 35%

    • The project can be implemented in any programming language.

    • The project proposal will have to specify the project aim, the tools and resources necessary to achieve the aim, and the expected results and how performance will be evaluated. Students will get feedback from their peers and the instructor after their proposals. The goal is to define a non-trial task that can be reasonably accomplished by the end of the semester. Students enrolled in CIS 530 will be expected to complete a project that requires substantial implementation beyond the use of standard language processing tools. CIS 430 students will be encouraged to experiment with existing tools (part of speech taggers, language modeling toolkits, sentiment lexicons, syntactic parsers, etc) and integrate the outputs of several such systems in their class project system. 2 pages. (5%)

    • [CIS 430 only] Presentation based on a scientific paper related to the project topic. The presentation will cover the aim, approach, results and their analysis of the selected paper. 15 minutes. (5%)

    • [CIS 530 only] Related work write-up. Will have to discuss at least 6 papers related to the chosen class project. The write-up will compare and contrast the approaches described in this paper, identify weaknesses and opportunities for improvement. 4 pages. (5%)

    • Final project write-up and code (25%)

  • Class Participation: 5%


Late-day policy:

Homework will be distributed in class and posted on the web page.
Late homeworks will be penalized based on the number of weekdays (or fractions thereof) passed since the HW was due:
  • 1 day late: 20% penalty
  • 2 days late: 30% penalty
  • 3 days late: 50% penalty
  • >3 days late: No credit
For example, an assignment is due on Thursday at 4PM, if turned in late before 4PM Friday you will receive a 20% penalty, Monday before 4PM a 30% penalty, Tuesday before 4PM a 50% penalty, and you will receive no credit if turned in after 4PM Tuesday.

Academic Integrity:

Code of Academic Integrity

Back to Top

CLASS MODULES

  • Sep 7, Lecture 1: Welcome to CIS 430/530 (Slides: [PPTX] [PDF])
  • Sep 12, Lecture 2: From Frequency to Meaning: Vector Space Models of Semantics (Slides: [PPTX] [PDF]) [Related Reading]
  • Sep 14, Lecture 3: Introduction to Language Models (Slides: [PPT] [PDF])
  • Sep 19, Python/NLTK/matplotlib Tutorial, Part 1 (Slides: [PPT] [PDF])
  • Sep 21, Python/NLTK/matplotlib Tutorial, Part 2
  • Sep 26, Lecture 4: Language Models II (Slides: [PPT] [PDF])
  • Sep 28, Lecture 5: Morphology (Slides: [PPT] [PDF])
  • Oct 3, Lecture 6: Part of Speech Tagging (Slides: [PPT] [PDF])
  • Oct 5, Lecture 7: Hidden Markov Models (Slides: [PPT] [PDF])
  • Oct 12, Lecture 8: Dynamic programming algorithms (Slides: [PPT] [PDF]
  • Oct 17, Lecture 9: Formal Grammars (Slides: [PPT] [PDF])
  • Oct 19, Lecture 10: Parsing (Slides: [PPT] [PDF])
  • Oct 24, Lecture 11: Statistical parsing (Slides: [PPT] [PDF])
  • Oct 26, Lecture 12: Automatic summarization (Slides: [PPTX] [PDF])
  • Oct 31, Lecture 13: Content selection (Slides: [PPTX] [PDF])
  • Nov 2, Lecture 14: Evaluation (Slides: [PPTX] [PDF])
  • Nov 7, Lecture 15: Word sense disambiguation (Slides: [PPT] [PDF])
  • Nov 9, Lecture 16: Word similarity (Slides: [PPT] [PDF])
  • Nov 14, Lecture 17: Semantic Roles (Slides: [PPT] [PDF])
  • Nov 16, Lecture 18: Coreference resolution (Slides: [PPT] [PDF])
  • Nov 20, Lecture 19: Discourse coherence (Slides: [PPT] [PDF])
  • Nov 30, Lecture 20: Discourse coherence and readability (Slides: [PPT] [PDF])
  • Dec 5, Lecture 21, Guest lecture by Annie Louis: Twitter NLP (Slides: [PDF])
Back to Top

General Information:

Using Python, NLTK, Coding Standards, etc.

How to submit:
  1. Connect to eniac.seas.upenn.edu
  2. Type the command ' turnin -c cis530 -p hwx filename'
  3. If the system requires to choose the section, type 'ALL'

Back to Top

OTHER RESOURCES

Project Resources

Related Reading

Python Resources


Back to Top


Back to Top

For more information, please contact nenkova (AT) cis.upenn.edu

Back to the CIS homepage