CIS 700: The Intersection of Natural Language Processing and Computer Vision

Prerequisites: CIS519, CIS520 or Computer Vision or Natural Language Processing experience

Description: This course will investigate how to build artificial intelligence systems that bridge language understanding and visual recognition. We will explore how computer vision systems can communicate what they see and how images can provide language systems with a richer understanding of the meaning of words. The goal is to introduce you to active areas of research at the intersection of natural language processing and computer vision -- for example, captioning images, visual question answering, instruction following and visual common sense. The focus will be on reading, understanding and critiquing papers from the last 5 to 10 years. This course will allow you to explore ideas established early on and cutting edge proposals from the last year.

Format: Every class period will either be
Meeting: Monday 3-4:30 EST; Wednesday 1:30-3 EST; Virtually on zoom: Here.
All sessions will be recorded, but delivered live.

Enrollment Apply via the waitlist (only by permission)
Please enroll to the disucssion forum Here

Evaluation: class participation 10%, paper presentation for reading group 20%, 4 paper summaries 20% , final project 50%

Reading group sign up and paper list can be found
Here

Deliverables: Extensions and Issues: Given these challenging times, I will allow extension liberally. You do not need to provide specific reasons. My goal is to be as flexible as possible, while still having you accomplish most of the work in the course.

The delivery of the course may evolve based on needs and accessibility. As such, the instructor reserves the right to change the schedule as needed, with as much warning as possible.


Office Hourse and Contact: Please schedule with Mark directly via email: myatskar AT seas.penn.edu

Schedule
Date Topic Extra Info
Sep 2 Lecture: Introduction + Organizational  
Sep 7 No class, labor day.  
Sep 9 Lecture: Machine Learning Background  
Sep 14 Lecture: Computer Vision and Natural Language Background  
Sep 16 Reading Group: Zero Shot Learning  
Sep 21 Lecture: Natural Language Background  
Sep 23 Reading Group: Multimodal embedding  
Sep 28 Lecture: RNNs + Captioning  
Sep 30 Reading Group: Captioning and Multimodal Machine Translation  
Oct 5 Lecture: Detection and Refering Expressions  
Oct 7 Reading Group: Refering Expressions  
Oct 12 Lecture: Visual Question Anserwing  
Oct 14 Reading Group: Visual Question Anserwing  
Oct 19 Reading Group: Visual Question Anserwing  
Oct 21 Lecture: Dataset Bias and Gender Bias  
Oct 26 Reading Group: Bias  
Oct 28 Lecture: Reasoning  
Nov 2 Reading Group: Reasoning and Common Sense  
Nov 4 Lecture: Grounded Dialog  
Nov 9 Reading Group: Grounded Dialog  
Nov 11 TBD  
Nov 16 TBD  
Nov 18 Lecture: Transformers  
Nov 23 Guest Lecture: Jiasen Lu : VilBERT and multitask learning  
Nov 25 Lecture: Instruction Following  
Nov 30 Guest Lecture: Peter Andersen : Vision and language navigation  
Dec 2 Reading Group: Bleeding edge papers tbd after CVPR deadline on November 15.  
Dec 7 Reading Group: Bleeding edge papers tbd after CVPR deadline on November 15.  
Dec 9 Project Presentations