Buying Private Data without Verification

Arpita Ghosh, Katrina Ligett, Aaron Roth, Grant Schoenebeck

[arXiv]

We consider the problem of designing a survey to aggregate non-verifiable information from a privacy-sensitive population: an analyst wants to compute some aggregate statistic from the private bits held by each member of a population, but cannot verify the correctness of the bits reported by participants in his survey. Individuals in the population are strategic agents with a cost for privacy, ie, they not only account for the payments they expect to receive from the mechanism, but also their privacy costs from any information revealed about them by the mechanism's outcome---the computed statistic as well as the payments---to determine their utilities. How can the analyst design payments to obtain an accurate estimate of the population statistic when individuals strategically decide both whether to participate and whether to truthfully report their sensitive information?

We design a differentially private peer-prediction mechanism that supports accurate estimation of the population statistic as a Bayes-Nash equilibrium in settings where agents have explicit preferences for privacy. The mechanism requires knowledge of the marginal prior distribution on bits, but does not need full knowledge of the marginal distribution on the costs, instead requiring only an approximate upper bound. Our mechanism guarantees differential privacy to each agent against any adversary who can observe the statistical estimate output by the mechanism, as well as the payments made to the n-1 other agents. Finally, we show that with slightly more structured assumptions on the privacy cost functions of each agent, the cost of running the survey goes to 0 as the number of agents diverges.