Course Home Page for Science of Data Ethics (CIS 399) University of Pennsylvania, Spring 2021

Science of Data Ethics (CIS 399)
Spring 2021
All lectures virtual and recorded live
Tuesdays and Thursdays 10:30-12PM ET

Instructor:

Prof. Michael Kearns
mkearns@cis.upenn.edu
Office hours (virtual): Tuesdays at noon (right after lecture) or by appointment.

Teaching Assistants:

Sheng Gao
shenggao@wharton.upenn.edu
Office Hours: Wednesdays 11-12AM ET or by appointment.

Subin Lee
subinlee@wharton.upenn.edu
Office Hours: Tuesdays 3-4PM ET or by appointment.

Hua Wang
wanghua@wharton.upenn.edu
Office Hours: Fridays 3:30-4:30PM ET or by appointment.

Course Description

This course is about the social and human problems that can arise from algorithms, AI and machine learning, and how we might design these technologies to be "better behaved" in the first place. It is first and foremost a science or engineering course, since we will be developing algorithm design principles. You can get a broad sense of course themes and topics by visiting the websites for the 2020 and 2019 offerings.

Prerequisites: Familiarity with some machine learning, basic statistics and probability theory will be helpful. While this is not a theory class, you need to be comfortable with mathematical notation and formalism. There will be some simple coding and data analysis assignments, so some basic programming ability is needed.

Course content will include readings from the scientifc literature, the mainstream media and other articles and books.

Grades will be based on homeworks and quizzes, as well as class participation and a book club presentation.

Important: All lectures for the course will be recorded live each Tuesday and Thursday from 10:30AM to noon ET. While I realize it's not possible for some of you, I'd like to encourage anyone who is able to attend the live lectures, participate in discussions, and ask questions. This engagement makes the lectures better for everyone --- me, the attendees, and those watching the recordings.

Here is the Zoom link for the live sessions. After each lecture, the recordings and lecture notes will be posted to the schedule below.

Lecture Dates

Topic

Slides, Readings, Assignments, Announcements

Thu Jan 21
Tue Jan 26
Course Introduction and Overview
Lecture videos: [Jan 21] [Jan 26]
Lecture notes
A general-audience introduction to some of the themes of the course is given in the (recommended but not required) book The Ethical Algorithm: The Science of Socially Aware Algorithm Design, by M. Kearns and A. Roth.
Also recommended but not required: Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy, by C. O'Neil.

Thu Jan 28
Tue Feb 2
Thu Feb 4
Foundations of Machine Learning Lecture videos: [Jan 28] [Feb 2] [Feb 4]
Lecture notes
Here is Homework 1, tentatively due Feb 16.
There is good and comprehensive set of videos and readings related to many of the topics we covered in these lectures in this Google machine learning course.

Tue Feb 9
Thu Feb 11
Tue Feb 16
Thu Feb 18
(Un)Fairness in Machine Learning: COMPAS and ProPublica
Lecture videos: [Feb 9] [Feb 11] [Feb 16] [Feb 18]
Lecture notes
The following readings are required; you should read the two ProPublica pieces before the Feb 9 lecture so we can discuss them then.
ProPublica article on COMPAS
ProPublica analysis
COMPAS Risk Assesment Survey (just skim)
Practitioner's Guide to COMPAS Core (no need to read, but we'll peruse a bit together in lecture)
ProPublica github repository, including dataset (we'll look at the dataset a bit in lecture)
Northpointe response to ProPublica (just skim)

Tue Feb 23
Thu Feb 25
Tue Mar 2
Thu Mar 4
Tue Mar 9
Tue Mar 16
Thu Mar 18
Tue Mar 23
Fairness in ML: Models and Algorithms Lecture videos: [Feb 23] [Feb 25] [Mar 2] [Mar 4] [Mar 9] [Mar 16] [Mar 18 (Berk)] [Mar 23]
Lecture notes
Readings:
Inherent Trade-Offs in the Fair Determination of Risk Scores, J. Kleinberg, S. Mullainathan, M. Raghavan. (read first 8 pages)
The Frontiers of Fairness in Machine Learning. Alexandra Chouldechova, Aaron Roth. (read entire article)
Please play around with the following Google demo site on fairness and ML.
Here is the zip file for Homework 2, which is due Monday March 8.
On Thursday March 18, we will finish off our fairness studies with a guest lecture by Penn's own Richard Berk , who will speak on "Politically Correct Criminal Justice Risk Assessment". Please read in advance the following related paper (long and technical, so just peruse for the high-level messages), and this New York Times op-ed.

Thu Mar 25
Thu Apr 1
Tue Apr 6
Thu Apr 8
Tue Apr 13
Thu Apr 15
Tue Apr 20
Thu Apr 22
Algorithmic Privacy
Lecture videos: [Mar 25] [Apr 1] [Apr 6] [Apr 8] [Apr 13] [Apr 15] [Apr 20] [Apr 22]
Lecture notes
Readings:
Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization. P. Ohm. (not assigned, but for your perusal)
Differentially Private Query Release Through Adaptive Projection. S. Aydore, W. Brown, M. Kearns, K. Kenthapadi, L. Melis, A. Roth, A. Siva.
Differential Privacy Overview. Apple.
How One of Apple's Key Privacy Safeguards Falls Short. Wired magazine.
Privacy Loss in Apple's Implementation of Differential Privacy on MacOS 10.12 J. Tang, A. Korolova, X. Bai, X. Wang, X. Wang.
Implementing Differential Privacy: Seven Lessons From the 2020 United States Census. M. Hawes.
See how your community is moving around differently due to COVID-19. Google Covid Mobility Reports.
Here is the zip file for Homework 3, which is due Tuesday April 6.
It's time to start thinking and planning for "book club", which will be held in the final lectures of the semester. (There may be also be a brief written homework assignment on differential privacy.) For book club, you should form teams of no more than five people, and choose a relatively recent book on technology, society and ethics. It's fine and common for the books to be written for a general audience. You will read the book and discuss with your team, and prepare a roughly 15-minute oral/slide presentation of your book in class sessions. An alternative is to identify a couple/few related technical papers from the scientific literature on course themes, and present an overview/synthesis of the papers. More details to come, but action items for now are to form your teams, and to propose your books to me. I'd like to make sure there are no duplicates, and ideally that the books are ones I haven't heard about before.

Wed Apr 28
Thu Apr 29
Book Club Presentations Videos:
[Apr 28] [Apr 29]

Lecture Dates	Topic	Slides, Readings, Assignments, Announcements
Thu Jan 21 Tue Jan 26	Course Introduction and Overview	Lecture videos: [Jan 21] [Jan 26] Lecture notes A general-audience introduction to some of the themes of the course is given in the (recommended but not required) book The Ethical Algorithm: The Science of Socially Aware Algorithm Design, by M. Kearns and A. Roth. Also recommended but not required: Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy, by C. O'Neil.
Thu Jan 28 Tue Feb 2 Thu Feb 4	Foundations of Machine Learning	Lecture videos: [Jan 28] [Feb 2] [Feb 4] Lecture notes Here is Homework 1, tentatively due Feb 16. There is good and comprehensive set of videos and readings related to many of the topics we covered in these lectures in this Google machine learning course.
Tue Feb 9 Thu Feb 11 Tue Feb 16 Thu Feb 18	(Un)Fairness in Machine Learning: COMPAS and ProPublica	Lecture videos: [Feb 9] [Feb 11] [Feb 16] [Feb 18] Lecture notes The following readings are required; you should read the two ProPublica pieces before the Feb 9 lecture so we can discuss them then. ProPublica article on COMPAS ProPublica analysis COMPAS Risk Assesment Survey (just skim) Practitioner's Guide to COMPAS Core (no need to read, but we'll peruse a bit together in lecture) ProPublica github repository, including dataset (we'll look at the dataset a bit in lecture) Northpointe response to ProPublica (just skim)
Tue Feb 23 Thu Feb 25 Tue Mar 2 Thu Mar 4 Tue Mar 9 Tue Mar 16 Thu Mar 18 Tue Mar 23	Fairness in ML: Models and Algorithms	Lecture videos: [Feb 23] [Feb 25] [Mar 2] [Mar 4] [Mar 9] [Mar 16] [Mar 18 (Berk)] [Mar 23] Lecture notes Readings: Inherent Trade-Offs in the Fair Determination of Risk Scores, J. Kleinberg, S. Mullainathan, M. Raghavan. (read first 8 pages) The Frontiers of Fairness in Machine Learning. Alexandra Chouldechova, Aaron Roth. (read entire article) Please play around with the following Google demo site on fairness and ML. Here is the zip file for Homework 2, which is due Monday March 8. On Thursday March 18, we will finish off our fairness studies with a guest lecture by Penn's own Richard Berk , who will speak on "Politically Correct Criminal Justice Risk Assessment". Please read in advance the following related paper (long and technical, so just peruse for the high-level messages), and this New York Times op-ed.
Thu Mar 25 Thu Apr 1 Tue Apr 6 Thu Apr 8 Tue Apr 13 Thu Apr 15 Tue Apr 20 Thu Apr 22	Algorithmic Privacy	Lecture videos: [Mar 25] [Apr 1] [Apr 6] [Apr 8] [Apr 13] [Apr 15] [Apr 20] [Apr 22] Lecture notes Readings: Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization. P. Ohm. (not assigned, but for your perusal) Differentially Private Query Release Through Adaptive Projection. S. Aydore, W. Brown, M. Kearns, K. Kenthapadi, L. Melis, A. Roth, A. Siva. Differential Privacy Overview. Apple. How One of Apple's Key Privacy Safeguards Falls Short. Wired magazine. Privacy Loss in Apple's Implementation of Differential Privacy on MacOS 10.12 J. Tang, A. Korolova, X. Bai, X. Wang, X. Wang. Implementing Differential Privacy: Seven Lessons From the 2020 United States Census. M. Hawes. See how your community is moving around differently due to COVID-19. Google Covid Mobility Reports. Here is the zip file for Homework 3, which is due Tuesday April 6. It's time to start thinking and planning for "book club", which will be held in the final lectures of the semester. (There may be also be a brief written homework assignment on differential privacy.) For book club, you should form teams of no more than five people, and choose a relatively recent book on technology, society and ethics. It's fine and common for the books to be written for a general audience. You will read the book and discuss with your team, and prepare a roughly 15-minute oral/slide presentation of your book in class sessions. An alternative is to identify a couple/few related technical papers from the scientific literature on course themes, and present an overview/synthesis of the papers. More details to come, but action items for now are to form your teams, and to propose your books to me. I'd like to make sure there are no duplicates, and ideally that the books are ones I haven't heard about before.
Wed Apr 28 Thu Apr 29	Book Club Presentations	Videos: [Apr 28] [Apr 29]