The Algorithmic Foundations of Data Privacy

Fall 2011
Instructor: Aaron Roth
Time: Tuesday/Thursday 1:30-3:00 pm
Room: Towne 315

Overview: Consider the following conundrum: You are the administrator of a large data set at a hospital (or search engine, or social network, or phone provider, or...). The data you hold is very valuable, and you would like to make it available to researchers with expertise in statistics and machine learning so that you can better make use of it. However, the data is also highly  sensitive! It consists of patient medical records, and although you would like to make aggregate statistics available, you must do so in a way that does not compromise the privacy of any individual who may (or may not!) be in the data set. What can you do?

This is the question that we will explore in this course. We will introduce and motivate the recently defined algorithmic constraint known as differential privacy, and then go on to explore what sorts of information can and cannot be released under this constraint. The answer turns out to be "a surprising large amount", and in trying to answer this question, we will develop a rich theory. The material will draw upon topics in learning theory, approximation algorithms, information theory, game theory, probability, and geometry.

Prerequisites: This will be a mathematically rigorous theory course, but the only prerequisite is mathematical maturity. Prior coursework in algorithms or complexity, and familiarity with probability will be helpful. The course is intended for graduate students, but advanced undergraduates are welcome and encouraged to speak with the instructor.

Goals and Grading: The goal of this course is to introduce students to differential privacy, and then bring them up to the frontier of modern privacy research. At the end of this course, students will be able to contribute to the research literature on the theory of data privacy. As such, the main graded component of this course will be a semester long research project. This project can be either a work of pure theory, or it can have a practical component. There is room for projects involving machine learning, game theory and mechanism design, programming languages, complexity theory, statistics, linear algebra, as well as pure algorithmic problems in differential privacy. Students will meet with the instructor over the course of the semester, present their work at the end of the class, and will be encouraged to produce a paper with the intention of publishing it.

Topics Covered:
  1. Definition and motivation of Differential Privacy
  2. Basic lower bounds for blatant non-privacy
  3. Basic building blocks: numeric-valued functions, and perturbations from the Laplace distribution
  4. Basic building blocks: the exponential mechanism and non-numeric valued functions
  5. Composition theorems for differentially private algorithms
  6. Keeping track of rare events: the sparse vector technique.
  7. Private query release: upper and lower bounds based on the existence of small nets
  8. Private query release: algorithms based on iterative database constructions
  9. Private query release: algorithms for the interactive setting
  10. Private query release: efficient algorithms
  11. Privacy and Machine Learning
  12. Lower bounds in differential privacy
  13. Variations: Pan-privacy
  14. Private combinatorial optimization
  15. Privacy and Mechanism Design
  16. More if time allows...
Keep up with the course blog.
Project: Projet ideas and deadlines are listed here.

Some Relevant Papers:

Lectures: