Instructor: Aaron Roth
Time: Tuesday/Thursday 1:30-3:00 pm
Room: Towne 315
: Consider the
following conundrum: You are the administrator of a large data set at a
hospital (or search engine, or social network, or phone provider,
or...). The data you hold is very valuable, and you would like to make
it available to researchers with expertise in statistics and machine
learning so that you can better make use of it. However, the data is
also highly sensitive! It consists of patient medical records,
and although you would like to make aggregate statistics available, you
must do so in a way that does not compromise the privacy of any
individual who may (or may not!) be in the data set. What can you do?
This is the question that we will explore in this course. We will
introduce and motivate the recently defined algorithmic constraint
known as differential privacy
and then go on to explore what sorts of information can and cannot be released under this constraint. The answer turns out to be "a surprising large amount", and in trying to answer this question, we will develop a rich theory. The material will draw upon topics in learning theory, approximation algorithms, information theory, game theory, probability, and geometry.
: This will be a
mathematically rigorous theory
course, but the only prerequisite is mathematical maturity. Prior
coursework in algorithms or complexity, and familiarity with
probability will be helpful. The course is intended for graduate
students, but advanced undergraduates are welcome and encouraged to
speak with the instructor.
Goals and Grading
: The goal of
this course is to introduce students to differential privacy, and then
bring them up to the frontier of modern privacy research. At the end of
this course, students will be able to contribute to the research
literature on the theory of data privacy. As such, the main graded
component of this course will be a semester long research project. This
project can be either a work of pure theory, or it can have a practical
component. There is room for projects involving machine learning, game
theory and mechanism design, programming languages, complexity theory, statistics, linear algebra,
as well as pure algorithmic problems in differential privacy. Students
will meet with the instructor over the course of
the semester, present their work at the end of the class, and will be
encouraged to produce a paper with the intention of publishing it.
- Definition and motivation of Differential Privacy
- Basic lower bounds for blatant non-privacy
- Basic building blocks: numeric-valued functions, and
perturbations from the Laplace distribution
- Basic building blocks: the exponential mechanism and non-numeric
theorems for differentially private algorithms
- Keeping track of rare events: the sparse vector technique.
- Private query release: upper and lower bounds based on the
existence of small nets
- Private query release: algorithms based on iterative database
- Private query release: algorithms for the interactive setting
- Private query release: efficient algorithms
- Privacy and Machine Learning
- Lower bounds in differential privacy
- Variations: Pan-privacy
- Private combinatorial optimization
- Privacy and Mechanism Design
- More if time allows...
Keep up with the course blog
: Projet ideas and deadlines are listed here.
Information While Preserving Privacy. Dinur and Nissim, 2003.
Privacy: the SuLQ Framework. Blum, Dwork, McSherry, Nissim, 2005.
- How To Break Anonymity of the Netflix Prize Dataset. Narayanan and Shmatikov, 2006.
Noise to Sensitivity in Private Data Analysis. Dwork, McSherry,
Nissim, and Smith, 2006.
Design via Differential Privacy. Mcsherry and Talwar, 2007.
Sensitivity and Sampling in Private Data Analysis. Nissim,
Raskhodnikova, and Smith, 2007.
can We Learn Privately? Kasiviswanathan, Lee, Nissim,
Raskhodnikova, and Smith, 2008.
Learning Theoretic Approach to Non-Interactive Database Privacy.
Blum, Ligett, and Roth, 2008.
- On the Complexity of Differentially Private Data Release: Efficient Algorithms and Hardness Results. Dwork, Naor, Reingold, Rothblum, and Vadhan, 2009.
- Differentially Private
Empirical Risk Minimization. Chaudhuri, Monteleoni, and Sarwate,
Streaming Algorithms. Dwork, Naor, Pitassi, Rothblum, Yekhanin,
- Privacy integrated queries: an extensible platform for privacy-preserving data analysis. Mcsherry, 2009.
Private Combinatorial Optimization. Gupta, Ligett, McSherry, Roth,
and Talwar, 2010.
- On the
Geometry of Differential Privacy. Hardt and Talwar, 2010.
Privacy via the Median Mechanism. Roth and Roughgarden, 2010.
- The Price of Privately Releasing Contingency Tables and the Spectra of Random Matrices with Correlated Rows. Kasiviswanathan, Rudelson, Smith, and Ullman, 2010.
Privacy under Continual Observation. Dwork, Naor, Pitassi, and
Multiplicative Weights Mechanism for Privacy Preserving Data Analysis.
Hardt and Rothblum, 2010.
and Differential Privacy. Dwork, Rothblum, Vadhan. 2010.
Limits of Two Party Differential Privacy. McGregor, Mironov,
Pitassi, Reingold, Talwar, Vadhan, 2010.
an Axiomitization of Statistical Privacy and Utility. Kifer and
Makes the Types Grow Stronger: A Calculus for Differential Privacy.
Reed and Pierce, 2010.
Algorithms Via Statistics on Sketches. Mir, Muthukrishnan, Nikolov,
and Wright, 2010
- PCPs and the Hardness of Generating Synthetic Data. Ullman and Vadhan, 2011.
- Privately Releasing
Conjunctions and the Statistical Query Barrier. Gupta, Hardt, Roth,
and Ullman, 2011.
- Privacy Preserving Statistical Estimation with Optimal Convergence Rates. Smith, 2011.
- Approximately Optimal
Mechanism Design via Differential Privacy. Nissim, Smorodinsky, and
- Selling Privacy at
Auction. Ghosh and Roth, 2011.
- Is Privacy Compatible with Truthfulness? Xiao, 2011.
- Differential Privacy Under Fire. Haeberlen, Pierce, and Narayan, 2011.
- Iterative Constructions and Private Data Release. Gupta, Roth,
and Ullman, 2011.
- Private Data Release via Learning Thresholds. Hardt, Rothblum, and Servedio, 2011.
- Fast Private Data Release Algorithms for Sparse Queries. Blum, Roth, 2011.