Differentially Private Two-Party Set Operations Bailey Kacsmar * , Basit Khurram * , Nils Lukas * , Alexander Norton * , Masoumeh Shafieinejad * , Zhiwei Shang * , Yaser Baseri * , Maryam Sepehri † , Simon Oya * , Florian Kerschbaum * * Cheriton School of Computer Science, University of Waterloo, Waterloo, Canada * {bkacsmar, mbkhurra, nlukas, ar2norto, masoumeh, z6shang, ybaseri, simon.oya, fkerschb}@uwaterloo.ca † Dipartimento di Informatica, Universita degli studi di Milano, Milan, Italy † maryam.sepehri@unimi.it Abstract—Private set intersection (PSI) allows two parties to compute the intersection of their data without revealing the data they possess that is outside of the intersection. However, in many cases of joint data analysis, the intersection is also sensitive. We define differentially private set intersection and we propose new protocols using (leveled) homomorphic encryption where the result is differentially private. Our circuit-based approach has an adaptability that allows us to achieve differential privacy, as well as to compute predi- cates over the intersection such as cardinality. Furthermore, our protocol produces differentially private output for set intersection and set intersection cardinality that is optimal in terms of communication and computation complexity. For a client set of size m and a server set of size n, where m is smaller than n, our communication complexity is O(m) while previous circuit-based protocols only achieve O(n +m) communication complexity. In addition to our asymptotic optimizations which include new analysis for using nested cuckoo hashing for PSI, we demonstrate the practicality of our protocol through an implementation that shows the feasibility of computing the differentially private intersection for large data sets containing millions of elements. Index Terms—differential privacy, homomorphic encryption, private set intersection 1. Introduction Private set intersection (PSI) [1]–[3] can protect sen- sitive data when the intersection is non-sensitive. It is, for example, used by Google and a partner to compute ad con- versions [4]. In this work, we present protocols designed for cases of joint data analysis where the intersection is also sensitive. Consider the following case where Google and Mas- tercard exchanged credit card transaction data without PSI [5]. Google paid Mastercard to access individuals’ credit card transactions that they could match to those users’ presented ads. One can argue that the fact that a credit card purchase was made is personally identifiable information. User-specific credit card purchases have been used to de-identify anonymized credit card statements [6]. Hence, it is necessary to protect, not only the users outside of the intersection, but those inside it as well. To address the protection needed for users inside the intersection of two data sets, we propose a new variant of PSI and demonstrate that our construction for this variant can be used in practical settings. In this paper, we define differentially private set oper- ations and we contribute a new private set intersection protocol whose result is differentially private, i.e., the intersection is protected as well. Circuit-based PSI proto- cols [7]–[9] can perform this function in theory. However, we improve over those protocols in communication com- plexity and memory consumption. For large circuits the memory consumption is commonly the bottleneck [10]. Furthermore, even in the best case for previous circuit- based protocols, the communication complexity is the sum of the sizes of the two databases [9]. We present the first circuit-based PSI protocol based on (leveled) homomorphic encryption. Our solution is asymptotically optimal in a number of criteria: Let the client have a set of size m and the server a set of size n where m<n. Then, our communication complexity is O (m). 1 Our computation complexity is O (n + m) (or O (n ) since m<n). Our differentially private output is optimally accurate for set intersection cardinality [13]. Note that the most recent circuit-based PSI protocols [8], [9] have communication complexity at least O (m + n ) and previous PSI protocols based on homomorphic en- cryption [11], [12] cannot compute arbitrary circuits (as is necessary for differential privacy) in addition to having computation complexity O (nm). Next to the theoretic optimality, we perform a number of optimizations that make our protocols practical. Let each element have a bit length ℓ ≥ log n. The multiplica- tive depth of our circuit is log ℓ +1 which is six multi- plications for 32 bits and hence practically feasible with many homomorphic encryption schemes. Furthermore, we use vectorization of the plaintexts, and our implementation uses a hashing technique that achieves better performance than the asymptotically optimal cuckoo hashing. We are the first to show that the secure computation of differen- tially private set operations – intersection and intersection cardinality – is practically feasible. While our practical performance cannot compete with the most efficient pro- tocols – either using homomorphic encryption [12] or circuit-based [9] – we can reasonably handle large data sets up to millions of elements, and these protocols [9], [12] do not protect privacy in the intersection. In particular, our communication cost when comparing m = 4 096 client elements to n = 10 6 server elements 1. Actually, our communication complexity is O (mℓ ), where ℓ rep- resents the bit length. However, we use the notation of recent related work [8], [9], [11], [12] where ℓ is assumed constant, and concentrate on the parameters n and m.