An algorithm for k-anonymous microaggregation and clustering inspired by the design of distortion-optimized quantizers ☆ David Rebollo-Monedero a, ⁎, Jordi Forné a , Miguel Soriano a,b a Department of Telematics Engineering, Universitat Politècnica de Catalunya (UPC) C. Jordi Girona 1–3, 08034 Barcelona, Spain b Centre Tecnològic de Telecomunicacions de Catalunya (CTTC) Av. Carl Friedrich Gauss 7, 08860 Castelldefels, Barcelona, Spain article info abstract Article history: Received 27 April 2010 Received in revised form 21 June 2011 Accepted 21 June 2011 Available online 2 July 2011 We present a multidisciplinary solution to the problems of anonymous microaggregation and clustering, illustrated with two applications, namely privacy protection in databases, and private retrieval of location-based information. Our solution is perturbative, is based on the same privacy criterion used in microdata k-anonymization, and provides anonymity through a substantial modiﬁcation of the Lloyd algorithm, a celebrated quantization design algorithm, endowed with numerical optimization techniques. Our algorithm is particularly suited to the important problem of k-anonymous microaggrega- tion of databases, with a small integer k representing the number of individual respondents indistinguishable from each other in the published database. Our algorithm also exhibits excellent performance in the problem of clustering or macroaggregation, where k may take on arbitrarily large values. We illustrate its applicability in this second, somewhat less common case, by means of an example of location-based services. Speciﬁcally, location-aware devices entrust a third party with accurate location information. This party then uses our algorithm to create distortion-optimized, size-constrained clusters, where k nearby devices share a common centroid location, which may be regarded as a distorted version of the original one. The centroid location is sent back to the devices, which use it when contacting untrusted location- based information providers, in lieu of the exact home location, to enforce k-anonymity. We compare the performance of our novel algorithm to the state-of-the-art microaggregation algorithm MDAV, on both synthetic and standardized real data, which encompass the cases of small and large values of k. The most promising aspect of our proposed algorithm is its capability to maintain the same k-anonymity constraint, while outperforming MDAV by a signiﬁcant reduction in data distortion, in all the cases considered. © 2011 Elsevier B.V. All rights reserved. Keywords: k-Anonymity Privacy Anonymous microaggregation MDAV Location-based services Distortion-optimized quantizer design Lloyd algorithm k-Means method 1. Introduction The right to privacy was recognized as early as 1948 by the United Nations in the Universal Declaration of Human Rights, Article 12. With the shifting of the Internet connectivity paradigm towards almost every object of everyday life, privacy will undeniably become as crucial as ever. We motivate the importance of privacy protection with two distinct applications in the growing technological ﬁelds of statistical disclosure control (SDC) and location-based services (LBSs), respectively, in this section. The next section will offer a more technical review in the context of the state of the art. Data & Knowledge Engineering 70 (2011) 892–921 ☆ The material in this paper has been published in part in the proceedings of the 20th Tyrrhenian International Workshop on Digital Communications, Sardinia, Italy, Sept. 2–4, 2009 [1]. ⁎ Corresponding author. Tel.: + 34 93 401 7027. E-mail addresses: david.rebollo@entel.upc.edu (D. Rebollo-Monedero), jforne@entel.upc.edu (J. Forné), soriano@entel.upc.edu, miquel.soriano@cttc.cat (M. Soriano). 0169-023X/$ – see front matter © 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.datak.2011.06.005 Contents lists available at ScienceDirect Data & Knowledge Engineering journal homepage: www.elsevier.com/locate/datak