Catching Cheats with Interactive Proofs: Privacy-preserving Crowd-sourced Data Collection Without Compromising Integrity Akshay Dua, Nirupama Bulusu, and Wu-chang Feng Department of Computer Science Portland State University {akshay,nbulusu,wuchang}@cs.pdx.edu ABSTRACT Crowd-sourced sensing systems allow people to voluntarily contribute sensor data from mobile devices. They enable numerous applications, including weather and traffic moni- toring. However, their proliferation is at risk if the problems of data integrity and privacy persist. People will be reluc- tant to contribute sensitive information if they cannot trust the system to maintain their privacy, and the system will be reluctant to accept any data transformed to preserve pri- vacy without proof that the transformation was computed accurately. We present an interactive proof protocol that al- lows an intermediary to convince a data consumer that it is accurately performing a privacy-preserving transformation with inputs from trusted sources, without providing those inputs to the consumer. We provide soundness and correct- ness proofs for the protocol, discuss its current limitations, and describe its parameters and their effect on data integrity and privacy when tweaked. 1. INTRODUCTION Integrity of the collected data, and the privacy of its sources are first order concerns for crowd-sourced sensing systems. Such systems can enable critical applications like faster emer- gency response, and improve health, environment, and traf- fic monitoring [13, 10, 1]. However, people will be reluc- tant to volunteer sensitive information (e.g. location, health statistics) if they cannot trust the system to protect their privacy. Conversely, if the volunteered information is first modified to protect privacy, then the system will be reluc- tant to trust that modification without proof of its integrity. Goals for integrity and privacy compete with each other. If the collected data has been previously transformed to pre- serve privacy (e.g. mixed, aggregated), then a data con- sumer can’t determine the transformation’s integrity unless the raw data used as input is presented as well. However, if the raw data is presented, then the privacy of the data sources gets compromised. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Copyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$10.00. This work attempts to simultaneously provide integrity and privacy guarantees for published data without signifi- cantly compromising either. The system model assumes a set of trusted data sources that collect and forward sensory data to a privacy proxy, which performs a privacy-preserving transformation on the received data, and finally forwards the result to a data consumer. The goal is to assure the data con- sumer that the proxy indeed computed the expected privacy transformation using data from expected sources (integrity) without providing the consumer with that data (privacy). Much of the existing work on crowd-sourced sensing with a focus on integrity and privacy adhere to this model. More- over, such a model has the advantage of decoupling the pri- vacy transformation from data collection, enabling transfor- mations that mix data from multiple sources, or perform application-specific data perturbations on the same data. Examples of this model include our earlier work on the de- sign and implementation of a Trusted Sensing Peripheral (TSP) that produces and publishes trustworthy sensory in- formation to a data portal via the user’s personal mobile device [5]. The mobile device is allowed to aggregate the data from the TSP before forwarding it. In this case, the mobile device can play the role of the privacy proxy, while the portal plays the role of the data consumer. Other ex- amples include PoolView [6], which introduces the personal privacy firewall to perturb a user’s raw data before pub- lishing it to an aggregation service, DietSense [13], which provides private storage to a user where she can edit the im- ages collected from her phone before sharing it further, and AnonySense [11], which uses a trusted server to mix data from at least l clients to provide l-anonymity. This paper presents an interactive proof protocol [9], using which, only an honest privacy proxy can convince a data con- sumer that it is correctly computing the expected privacy- preserving transformation, while protecting the privacy of its data sources. We provide soundness and correctness proofs for the protocol, discuss its current limitations, and describe its parameters and their effect on data integrity and privacy when tweaked. 2. PROBLEM STATEMENT Only an honest privacy proxy P should be allowed to con- vince a data consumer C that it is indeed transforming data D = {d1j ,d2j , ..., dnj }, received in interval j , from a set of sources S = {s1,s2, ..., sn}, using the transformation func- tion fpriv . C receives the result pj = fpriv (D), but never the data D. The system model is shown in the figure below.