CIS700: Advanced Topics in Databases -- the "Big Data" Revolution

 

Spring 2018

 

Instructor:
Susan B. Davidson:  566 Levine North, 898-3490, susan@cis.upenn.edu

Prerequisites: CIS 550 or equivalent

Textbook:  Research papers will be made available over the web, linked to the course syllabus.

Time and Location: MW 1:30-3, Towne 309

Description: "Big data" has driven the revolution of database technology in several dimensions, including the need for more flexible models, the consideration of streaming and time-varying data, different notions of updates and consistency, and the need for parallelism. Due to the tight interaction with complex analysis and inference pipelines, it has also increased the need for more accountability and the careful consideration of ethical issues surrounding the use of the data. In this course, we will study various aspects of this revolution through a combination of lectures on basic material and recent papers in the database literature.

We will start by revisiting the theoretical underpinnings of relational query languages -- conjunctive queries, relational algebra, and calculus -- and then consider an extension that captures recursive queries called Datalog. We will then look at how this formalism is used in the context of several practical problems: query optimization, data integration, and data citation. Moving beyond relational systems, we will study the underpinnings of NoSQL solutions based on JSON (e.g. MongoDB) and graphs (e.g. RDF and Neo4j), as well as extensions for time-varying and streaming data.

This is an advanced course intended for students who are interested in research topics in the field of databases and who have completed a database course (e.g. CIS550). Students will be expected to present 2-3 research papers, prepare summaries of papers covered in class, and complete a project on a topic approved by the instructor.

Grading:

Detailed Syllabus (in progress)  

Project Details  



 

Susan Davidson


1/7/2018