# Preventing Fairness Gerrymandering: Auditing and Learning for Subgroup Fairness

Michael Kearns, Seth Neel, Aaron Roth, Steven Wu

[arXiv] See this blog post for a longer informal exposition.

The most prevalent notions of fairness in machine learning are \emph{statistical} definitions: they fix a small collection of high-level, pre-defined groups
(such as race or gender),
and then ask for approximate parity of some statistic of the classifier (like positive classification rate or false positive rate) across these groups.
Constraints of this form are susceptible to (intentional or inadvertent) *fairness gerrymandering*, in which a classifier appears to be fair on each individual group,
but badly violates the fairness constraint on one or more structured \emph{subgroups} defined over the protected attributes
(such as certain combinations of protected attribute values).
We propose instead to demand statistical notions of fairness
across exponentially (or infinitely) many subgroups, defined by a structured class of functions over the protected attributes. This interpolates between statistical definitions of fairness, and recently proposed individual notions of fairness, but it raises several computational challenges. It is no longer clear how to even check or *audit* a fixed classifier to see if it satisfies such a strong definition of fairness. We prove that the computational problem of auditing subgroup fairness for both equality of false positive rates and statistical parity is equivalent to the problem of weak agnostic learning --- which means it is computationally hard in the worst case, even for simple structured subclasses. However, it also suggests that common heuristics for learning can be applied to successfully solve the auditing problem in practice.

We then derive two algorithms that provably converge to the best fair distribution over classifiers in a given class,
given access to oracles which can optimally solve the agnostic learning problem.
The algorithms are based on a formulation of subgroup fairness
as a two-player zero-sum game between a Learner (the primal player) and
an Auditor (the dual player). Both algorithms compute an equilibrium of this game. We obtain our first algorithm by simulating play of the game by having Learner play an instance of the no-regret *Follow the Perturbed Leader* algorithm,
and having Auditor play best response. This algorithm provably converges to an approximate Nash equilibrium
(and thus to an approximately optimal subgroup-fair distribution over classifiers) in a polynomial number of steps.
We obtain our second algorithm by simulating play of the game by having both players play *Fictitious Play*, which enjoys
only provably asymptotic convergence, but has the
merit of simplicity and faster per-step computation.
We implement the Fictitious Play version using linear regression as a heuristic oracle,
and show that we can effectively both audit and learn fair classifiers on real datasets.