Fall 2011

Instructor: Aaron Roth

Time: Tuesday/Thursday 1:30-3:00 pm

Room: Towne 315

Overview: Consider the following conundrum: You are the administrator of a large data set at a hospital (or search engine, or social network, or phone provider, or...). The data you hold is very valuable, and you would like to make it available to researchers with expertise in statistics and machine learning so that you can better make use of it. However, the data is also highly sensitive! It consists of patient medical records, and although you would like to make aggregate statistics available, you must do so in a way that does not compromise the privacy of any individual who may (or may not!) be in the data set. What can you do?

This is the question that we will explore in this course. We will introduce and motivate the recently defined algorithmic constraint known as differential privacy, and then go on to explore what sorts of information can and cannot be released under this constraint. The answer turns out to be "a surprising large amount", and in trying to answer this question, we will develop a rich theory. The material will draw upon topics in learning theory, approximation algorithms, information theory, game theory, probability, and geometry.

Prerequisites: This will be a mathematically rigorous theory course, but the only prerequisite is mathematical maturity. Prior coursework in algorithms or complexity, and familiarity with probability will be helpful. The course is intended for graduate students, but advanced undergraduates are welcome and encouraged to speak with the instructor.

Goals and Grading: The goal of this course is to introduce students to differential privacy, and then bring them up to the frontier of modern privacy research. At the end of this course, students will be able to contribute to the research literature on the theory of data privacy. As such, the main graded component of this course will be a semester long research project. This project can be either a work of pure theory, or it can have a practical component. There is room for projects involving machine learning, game theory and mechanism design, programming languages, complexity theory, statistics, linear algebra, as well as pure algorithmic problems in differential privacy. Students will meet with the instructor over the course of the semester, present their work at the end of the class, and will be encouraged to produce a paper with the intention of publishing it.

Topics Covered:

Project: Projet ideas and deadlines are listed here.

Time: Tuesday/Thursday 1:30-3:00 pm

Room: Towne 315

Overview: Consider the following conundrum: You are the administrator of a large data set at a hospital (or search engine, or social network, or phone provider, or...). The data you hold is very valuable, and you would like to make it available to researchers with expertise in statistics and machine learning so that you can better make use of it. However, the data is also highly sensitive! It consists of patient medical records, and although you would like to make aggregate statistics available, you must do so in a way that does not compromise the privacy of any individual who may (or may not!) be in the data set. What can you do?

This is the question that we will explore in this course. We will introduce and motivate the recently defined algorithmic constraint known as differential privacy, and then go on to explore what sorts of information can and cannot be released under this constraint. The answer turns out to be "a surprising large amount", and in trying to answer this question, we will develop a rich theory. The material will draw upon topics in learning theory, approximation algorithms, information theory, game theory, probability, and geometry.

Prerequisites: This will be a mathematically rigorous theory course, but the only prerequisite is mathematical maturity. Prior coursework in algorithms or complexity, and familiarity with probability will be helpful. The course is intended for graduate students, but advanced undergraduates are welcome and encouraged to speak with the instructor.

Goals and Grading: The goal of this course is to introduce students to differential privacy, and then bring them up to the frontier of modern privacy research. At the end of this course, students will be able to contribute to the research literature on the theory of data privacy. As such, the main graded component of this course will be a semester long research project. This project can be either a work of pure theory, or it can have a practical component. There is room for projects involving machine learning, game theory and mechanism design, programming languages, complexity theory, statistics, linear algebra, as well as pure algorithmic problems in differential privacy. Students will meet with the instructor over the course of the semester, present their work at the end of the class, and will be encouraged to produce a paper with the intention of publishing it.

Topics Covered:

- Definition and motivation of Differential Privacy
- Basic lower bounds for blatant non-privacy

- Basic building blocks: numeric-valued functions, and perturbations from the Laplace distribution
- Basic building blocks: the exponential mechanism and non-numeric
valued functions

- Composition theorems for differentially private algorithms
- Keeping track of rare events: the sparse vector technique.
- Private query release: upper and lower bounds based on the
existence of small nets

- Private query release: algorithms based on iterative database constructions
- Private query release: algorithms for the interactive setting
- Private query release: efficient algorithms
- Privacy and Machine Learning

- Lower bounds in differential privacy
- Variations: Pan-privacy
- Private combinatorial optimization
- Privacy and Mechanism Design
- More if time allows...

Project: Projet ideas and deadlines are listed here.

Some
Relevant Papers:

- Revealing Information While Preserving Privacy. Dinur and Nissim, 2003.
- Practical
Privacy: the SuLQ Framework. Blum, Dwork, McSherry, Nissim, 2005.

- How To Break Anonymity of the Netflix Prize Dataset. Narayanan and Shmatikov, 2006.

- Callibrating Noise to Sensitivity in Private Data Analysis. Dwork, McSherry, Nissim, and Smith, 2006.
- Mechanism
Design via Differential Privacy. Mcsherry and Talwar, 2007.

- Smooth
Sensitivity and Sampling in Private Data Analysis. Nissim,
Raskhodnikova, and Smith, 2007.

- What can We Learn Privately? Kasiviswanathan, Lee, Nissim, Raskhodnikova, and Smith, 2008.
- A Learning Theoretic Approach to Non-Interactive Database Privacy. Blum, Ligett, and Roth, 2008.
- On the Complexity of Differentially Private Data Release: Efficient Algorithms and Hardness Results. Dwork, Naor, Reingold, Rothblum, and Vadhan, 2009.
- Differentially Private Empirical Risk Minimization. Chaudhuri, Monteleoni, and Sarwate, 2009.
- Pan-Private
Streaming Algorithms. Dwork, Naor, Pitassi, Rothblum, Yekhanin,
2009.

- Privacy integrated queries: an extensible platform for privacy-preserving data analysis. Mcsherry, 2009.
- Differentially
Private Combinatorial Optimization. Gupta, Ligett, McSherry, Roth,
and Talwar, 2010.

- On the Geometry of Differential Privacy. Hardt and Talwar, 2010.
- Interactive
Privacy via the Median Mechanism. Roth and Roughgarden, 2010.

- The Price of Privately Releasing Contingency Tables and the Spectra of Random Matrices with Correlated Rows. Kasiviswanathan, Rudelson, Smith, and Ullman, 2010.
- Differential
Privacy under Continual Observation. Dwork, Naor, Pitassi, and
Rothblum, 2010.

- A Multiplicative Weights Mechanism for Privacy Preserving Data Analysis. Hardt and Rothblum, 2010.
- Boosting and Differential Privacy. Dwork, Rothblum, Vadhan. 2010.
- The
Limits of Two Party Differential Privacy. McGregor, Mironov,
Pitassi, Reingold, Talwar, Vadhan, 2010.

- Towards
an Axiomitization of Statistical Privacy and Utility. Kifer and
Lin, 2010.

- Distance Makes the Types Grow Stronger: A Calculus for Differential Privacy. Reed and Pierce, 2010.
- Pan-Private
Algorithms Via Statistics on Sketches. Mir, Muthukrishnan, Nikolov,
and Wright, 2010

- PCPs and the Hardness of Generating Synthetic Data. Ullman and Vadhan, 2011.
- Privately Releasing Conjunctions and the Statistical Query Barrier. Gupta, Hardt, Roth, and Ullman, 2011.
- Privacy Preserving Statistical Estimation with Optimal Convergence Rates. Smith, 2011.
- Approximately Optimal
Mechanism Design via Differential Privacy. Nissim, Smorodinsky, and
Tennenholtz, 2011

- Selling Privacy at
Auction. Ghosh and Roth, 2011.

- Is Privacy Compatible with Truthfulness? Xiao, 2011.

- Differential Privacy Under Fire. Haeberlen, Pierce, and Narayan, 2011.

- Iterative Constructions and Private Data Release. Gupta, Roth,
and Ullman, 2011.

- Private Data Release via Learning Thresholds. Hardt, Rothblum, and Servedio, 2011.

- Fast Private Data Release Algorithms for Sparse Queries. Blum, Roth, 2011.

- Lecture 1: Overview -- 9/8/2011
- Lecture 2: Blatant Non-Privacy and the Laplace Mechanism -- 9/13/2011
- Lecture 3: The Exponential Mechanism -- 9/15/2011
- Lecture 4: Composition Theorems -- 9/20/2011
- Lecture 5: The Net Mechanism -- 9/27/2011
- Lecture 6: The Net Mechanism: A Partial Converse -- 9/29/2011
- Lecture 7: Iterative Database Construction -- 10/4/2011
- Lecture 8: Database Update Algorithms: The Median Mechanism. -- 10/6/2011
- Lecture 9: Database Update Algorithms: The Multiplicative Weights Mechanism. -- 10/13/2011
- Lectore 10: Guest Lecture by Marco Gaboardi on Fuzz. -- 10/18/2011
- Lecture 11: The Sparse Vector Technique -- 10/20/2011
- Lecture 12: Interactive Query Release Mechanisms -- 10/25/2011
- Lecture 13: Efficient Interactive Query Release Mechanisms (Sparse Queries) -- 10/27/2011
- Lecture 14: Query Release and Agnostic Learning -- 11/1/2011
- Lecture 15: Private Combinatorial Optimization: Min-Cut -- 11/3/2011
- Lecture 16: Private Combinatorial Optimization: Vertex Cover -- 11/8/2011
- Lecture 17: Smooth Sensitivity and Computing the Median -- 11/10/2011 (Notes temporarily down to correct a bug)
- Lecture 18: Streaming Algorithms: Pan Privacy and User Level Privacy -- 11/15/2011
- Lecture 19: Streaming Algorithms: Continual Output -- 11/17/2011
- Lecture 20: Privacy and Mechanism Design 1 -- 11/22/2011
- Lecture 21: Privacy and Mechanism Design 2 -- 11/29/2011
- Group Presentations: 1) Justin, 2) Arjun + Mukund, 3) Rasul -- 12/1/2011
- Group Presentations: 1) Mehdi + Shaudi, 2) Mingchen + Nan, 3) Zhiyi -- 12/6/2011
- Group Presentations: 1) Chris. 2) Hoda + Salar -- 12/8/2011

Related Courses: