Science of Data Ethics (CIS 399)
Spring 2020
Tuesdays and Thursdays 1:30-3PM
Room 401B (Active Learning Classroom), 3401 Walnut Street

Instructors:

Prof. Michael Kearns
mkearns@cis.upenn.edu

Dr. Kristian Lum
kristianlum@gmail.com

Teaching Assistants:

Emily Diana
ediana@wharton.upenn.edu

Vignesh Valliyur
vigneshv@sas.upenn.edu


Course Description

More detail will be added here shortly, but you can get a broad sense of course themes and topics by visiting the website for the first offering of the course last year. There will be significant changes this year, however, primarily in the choice of the more applied case studies we examine.

Prerequisites: Familiarity with some basic statistics and probability theory will be helpful. While this is not a theory class, you need to be comfortable with mathematical notation and formalism. There may be some simple coding assignments, so some basic programming ability is needed.

Course content will include readings from the scientifc literature, the mainstream media and other articles.

Grades will be based on homeworks and quizzes, as well as class participation and a book club presentation.


Course Schedule

Date                             Topics Readings, Assignments, Announcements
Thu Jan 16
Tue Jan 21
Course Introduction and Overview
A general-audience introduction to some of the themes of the course is given in the (recommended but not required) book The Ethical Algorithm: The Science of Socially Aware Algorithm Design, by M. Kearns and A. Roth.
Thu Jan 23
Tue Jan 28
Thu Jan 30
Mathematical Foundations of Machine Learning There is good and comprehensive set of videos and readings related to the topics we covered in these lectures in this Google machine learning course.
Tue Feb 4
(Un)Fairness in Machine Learning: COMPAS and ProPublica The following readings are required for these lectures; you should read the two ProPublica pieces before the Feb 4 lecture.

ProPublica article on COMPAS

ProPublica analysis

Thu Feb 6
Tue Feb 11
Fairness in Criminal Justice Dr. Lum spoke on the following required reading:

The impact of overbooking on a pre-trial risk assessment tool, K. Lum, C. Boudin, M. Price. Here are the accompanying slides.

Please also read this recent NYT article.

Thu Feb 13
Fairness vs. Fairness Please read the first 8 pages of the following article:

Inherent Trade-Offs in the Fair Determination of Risk Scores, J. Kleinberg, S. Mullainathan, M. Raghavan.

Please play around with the following Google demo site on fairness and ML.

Tue Feb 18
Guest speaker: Prof. Richard Berk, Penn Criminology and Statistics Please skim the following article:

Almost Politically Acceptable Criminal Justice Risk Assessment", R. Berk, A. Elzarka.

Thu Feb 20
Fairness Definitions and Consequences, Continued We will spend some time examining this website, which will be the (partial) basis of the first course homework/project.
Tue Feb 25
COMPAS/ProPublica Data Analysis Project: Working Session Today we will have a working session in groups for the data analysis project using the ProPublica dataset, which is available here.
Thu Feb 27
Post-Processing and In-Processing Approaches to Fairness;
Working Session for Data Analysis Project
Here is a document providing some ideas for your projects, which will be updated with further ideas, resources and specifications.
Tue Mar 3
More on Post-Processing/Bolt-On Approaches to Fairness;
Working Session for Data Analysis Project
Here is a paper describing one of the approaches discussed in lecture today.
Thu Mar 5
Working Session for Data Analysis Project .
Tue Mar 24
Statistical Challenges in Causality Estimation The updated due date for the Data Analysis Project is Friday April 3.

Here is the Zoom link for the class lectures going forward. This same link should work for both the Tuesday and Thursday meetings for the rest of the term.

Here is the recording of this lecture.

Thu Mar 26
Data-Driven Network Modeling of Contagion Here is the recording of this lecture.
Tue Mar 31
Introduction to Privacy and Security Here are the notes from this lecture.
Thu Apr 2
Three Flawed Privacy Notions: Anonymity, Aggregation, and No Harm Whatsoever Here are the recording and notes from this lecture.
Tue Apr 7
Differential Privacy I: Randomized Response; Definition of DP Important Update: Due to security concerns, the Zoom lectures are now password-protected. All participants should have received an email with the new link and password.

Here are the recording and notes from this lecture.

Thu Apr 9
Differential Privacy II: DP analysis of Randomized Response; Sensitivity of Functions; The Laplace Mechanism Here are the recording and notes from this lecture.
Tue Apr 14
Differential Privacy III: DP analysis Laplace Mechanism; The Exponential Mechanism Here are the recording (password is 399Ethics!) and notes from this lecture.
Thu Apr 16
Differential Privacy IV: Synthetic Data; Repetition and Composition; Applications to Other Research Areas; Deployments of DP The Exponential Mechanism Here are the recording (password is 399Ethics!) and notes from this lecture.

Here are some links to the deployments of DP mentioned briefly towards the end of lecture: DP in the Apple OS and some concerns over its implementation; DP at Google; a recent article expressing concern over DP in the 2020 U.S. Census; and the recent use of DP in Google's COVID-19 mobility reports.

Tue Apr 21 Book Group Presentations I Here is the recording from this lecture (password is 399Ethics!).
Thu Apr 23 Book Group Presentations II Here is the recording from this lecture (password is 399Ethics!).
Tue Apr 28 Book Group Presentations III Here is the recording from this lecture (password is 399Ethics!).