The Algorithmic Foundations of Data Privacy

Fall 2011

Instructor: Aaron Roth
Time: Tuesday/Thursday 1:30-3:00 pm
Room: Towne 315

Overview: Consider the following conundrum: You are the administrator of a large data set at a hospital (or search engine, or social network, or phone provider, or...). The data you hold is very valuable, and you would like to make it available to researchers with expertise in statistics and machine learning so that you can better make use of it. However, the data is also highly sensitive! It consists of patient medical records, and although you would like to make aggregate statistics available, you must do so in a way that does not compromise the privacy of any individual who may (or may not!) be in the data set. What can you do?

This is the question that we will explore in this course. We will introduce and motivate the recently defined algorithmic constraint known as differential privacy, and then go on to explore what sorts of information can and cannot be released under this constraint. The answer turns out to be "a surprising large amount", and in trying to answer this question, we will develop a rich theory. The material will draw upon topics in learning theory, approximation algorithms, information theory, game theory, probability, and geometry.

Prerequisites: This will be a mathematically rigorous theory course, but the only prerequisite is mathematical maturity. Prior coursework in algorithms or complexity, and familiarity with probability will be helpful. The course is intended for graduate students, but advanced undergraduates are welcome and encouraged to speak with the instructor.

Goals and Grading: The goal of this course is to introduce students to differential privacy, and then bring them up to the frontier of modern privacy research. At the end of this course, students will be able to contribute to the research literature on the theory of data privacy. As such, the main graded component of this course will be a semester long research project. This project can be either a work of pure theory, or it can have a practical component. There is room for projects involving machine learning, game theory and mechanism design, programming languages, complexity theory, statistics, linear algebra, as well as pure algorithmic problems in differential privacy. Students will meet with the instructor over the course of the semester, present their work at the end of the class, and will be encouraged to produce a paper with the intention of publishing it.

Book: New (8/2014) We now have a book publically available that covers much of the material from this class: The Algorithmic Foundations of Differential Privacy.

Topics Covered:

Definition and motivation of Differential Privacy
Basic lower bounds for blatant non-privacy
Basic building blocks: numeric-valued functions, and perturbations from the Laplace distribution
Basic building blocks: the exponential mechanism and non-numeric valued functions
Composition theorems for differentially private algorithms
Keeping track of rare events: the sparse vector technique.
Private query release: upper and lower bounds based on the existence of small nets
Private query release: algorithms based on iterative database constructions
Private query release: algorithms for the interactive setting
Private query release: efficient algorithms
Privacy and Machine Learning
Lower bounds in differential privacy
Variations: Pan-privacy
Private combinatorial optimization
Privacy and Mechanism Design
More if time allows...

Keep up with the course blog.
Project: Projet ideas and deadlines are listed here.

Some Relevant Papers:

Revealing Information While Preserving Privacy. Dinur and Nissim, 2003.
Practical Privacy: the SuLQ Framework. Blum, Dwork, McSherry, Nissim, 2005.
How To Break Anonymity of the Netflix Prize Dataset. Narayanan and Shmatikov, 2006.
Callibrating Noise to Sensitivity in Private Data Analysis. Dwork, McSherry, Nissim, and Smith, 2006.
Mechanism Design via Differential Privacy. Mcsherry and Talwar, 2007.
Smooth Sensitivity and Sampling in Private Data Analysis. Nissim, Raskhodnikova, and Smith, 2007.
What can We Learn Privately? Kasiviswanathan, Lee, Nissim, Raskhodnikova, and Smith, 2008.
A Learning Theoretic Approach to Non-Interactive Database Privacy. Blum, Ligett, and Roth, 2008.
On the Complexity of Differentially Private Data Release: Efficient Algorithms and Hardness Results. Dwork, Naor, Reingold, Rothblum, and Vadhan, 2009.
Differentially Private Empirical Risk Minimization. Chaudhuri, Monteleoni, and Sarwate, 2009.
Pan-Private Streaming Algorithms. Dwork, Naor, Pitassi, Rothblum, Yekhanin, 2009.
Privacy integrated queries: an extensible platform for privacy-preserving data analysis. Mcsherry, 2009.
Differentially Private Combinatorial Optimization. Gupta, Ligett, McSherry, Roth, and Talwar, 2010.
On the Geometry of Differential Privacy. Hardt and Talwar, 2010.
Interactive Privacy via the Median Mechanism. Roth and Roughgarden, 2010.
The Price of Privately Releasing Contingency Tables and the Spectra of Random Matrices with Correlated Rows. Kasiviswanathan, Rudelson, Smith, and Ullman, 2010.
Differential Privacy under Continual Observation. Dwork, Naor, Pitassi, and Rothblum, 2010.
A Multiplicative Weights Mechanism for Privacy Preserving Data Analysis. Hardt and Rothblum, 2010.
Boosting and Differential Privacy. Dwork, Rothblum, Vadhan. 2010.
The Limits of Two Party Differential Privacy. McGregor, Mironov, Pitassi, Reingold, Talwar, Vadhan, 2010.
Towards an Axiomitization of Statistical Privacy and Utility. Kifer and Lin, 2010.
Distance Makes the Types Grow Stronger: A Calculus for Differential Privacy. Reed and Pierce, 2010.
Pan-Private Algorithms Via Statistics on Sketches. Mir, Muthukrishnan, Nikolov, and Wright, 2010
PCPs and the Hardness of Generating Synthetic Data. Ullman and Vadhan, 2011.
Privately Releasing Conjunctions and the Statistical Query Barrier. Gupta, Hardt, Roth, and Ullman, 2011.
Privacy Preserving Statistical Estimation with Optimal Convergence Rates. Smith, 2011.
Approximately Optimal Mechanism Design via Differential Privacy. Nissim, Smorodinsky, and Tennenholtz, 2011
Selling Privacy at Auction. Ghosh and Roth, 2011.
Is Privacy Compatible with Truthfulness? Xiao, 2011.
Differential Privacy Under Fire. Haeberlen, Pierce, and Narayan, 2011.
Iterative Constructions and Private Data Release. Gupta, Roth, and Ullman, 2011.
Private Data Release via Learning Thresholds. Hardt, Rothblum, and Servedio, 2011.
Fast Private Data Release Algorithms for Sparse Queries. Blum, Roth, 2011.