Differential Privacy: An Estimation Theory-Based Method for Choosing Epsilon Maurizio Naldi and Giuseppe D’Acquisto Universit` a di Roma Tor Vergata Department of Computer Science and Civil Engineering Via del Politecnico 1, 00133 Roma, Italy naldi@disp.uniroma2.it dacquisto@ing.uniroma2.it Abstract. Differential privacy is achieved by the introduction of Lapla- cian noise in the response to a query, establishing a precise trade-off be- tween the level of differential privacy and the accuracy of the database response (via the amount of noise introduced). However, the amount of noise to add is typically defined through the scale parameter of the Laplace distribution, whose use may not be so intuitive. In this paper we propose to use two parameters instead, related to the notion of interval estimation, which provide a more intuitive picture of how precisely the true output of a counting query may be gauged from the noise-polluted one (hence, how much the individual’s privacy is protected). Keywords: Statistical databases; Differential privacy; Anonymization. 1 Introduction In statistical databases, records may include personal details, but responses are provided for queries concerning aggregate data, i.e., as statistics. Though the access to individual records may be denied, it is possible to use a combination of aggregate queries to obtain information about a single individual. In order to guarantee the privacy of users whose data are contained in statistical databases, the notion of differential privacy has been introduced [1]. Through differential privacy, the probability of obtaining the same response to two queries where the actual values differ instead by 1 (e.g., differ for 1 individual) is lower bounded by e -ǫ , where ǫ is the differential privacy level, and the mechanism is said to be ǫ-differentially private. By setting ǫ very close to 0, the responses to two queries as described above can be made practically indistinguishable from each other, so that their combination cannot be used to infer the datum concerning that individual. However, setting ǫ too close to 0 may make the queries useless, since the response could provide no clue at all about the actual values contained in the database (see [3] for the possibility of using Bayes theorem to refine the estimation of the true value). Setting the correct level of differential privacy is therefore essential for the actual use of the mechanism, allowing to strike a balance between the wish for arXiv:1510.00917v1 [cs.CR] 4 Oct 2015