An algorithm for k-anonymous microaggregation and clustering inspired by
the design of distortion-optimized quantizers
☆
David Rebollo-Monedero
a,
⁎, Jordi Forné
a
, Miguel Soriano
a,b
a
Department of Telematics Engineering, Universitat Politècnica de Catalunya (UPC) C. Jordi Girona 1–3, 08034 Barcelona, Spain
b
Centre Tecnològic de Telecomunicacions de Catalunya (CTTC) Av. Carl Friedrich Gauss 7, 08860 Castelldefels, Barcelona, Spain
article info abstract
Article history:
Received 27 April 2010
Received in revised form 21 June 2011
Accepted 21 June 2011
Available online 2 July 2011
We present a multidisciplinary solution to the problems of anonymous microaggregation and
clustering, illustrated with two applications, namely privacy protection in databases, and
private retrieval of location-based information. Our solution is perturbative, is based on the
same privacy criterion used in microdata k-anonymization, and provides anonymity through a
substantial modification of the Lloyd algorithm, a celebrated quantization design algorithm,
endowed with numerical optimization techniques.
Our algorithm is particularly suited to the important problem of k-anonymous microaggrega-
tion of databases, with a small integer k representing the number of individual respondents
indistinguishable from each other in the published database. Our algorithm also exhibits
excellent performance in the problem of clustering or macroaggregation, where k may take on
arbitrarily large values. We illustrate its applicability in this second, somewhat less common
case, by means of an example of location-based services. Specifically, location-aware devices
entrust a third party with accurate location information. This party then uses our algorithm to
create distortion-optimized, size-constrained clusters, where k nearby devices share a common
centroid location, which may be regarded as a distorted version of the original one. The
centroid location is sent back to the devices, which use it when contacting untrusted location-
based information providers, in lieu of the exact home location, to enforce k-anonymity.
We compare the performance of our novel algorithm to the state-of-the-art microaggregation
algorithm MDAV, on both synthetic and standardized real data, which encompass the cases of
small and large values of k. The most promising aspect of our proposed algorithm is its
capability to maintain the same k-anonymity constraint, while outperforming MDAV by a
significant reduction in data distortion, in all the cases considered.
© 2011 Elsevier B.V. All rights reserved.
Keywords:
k-Anonymity
Privacy
Anonymous microaggregation
MDAV
Location-based services
Distortion-optimized quantizer design
Lloyd algorithm
k-Means method
1. Introduction
The right to privacy was recognized as early as 1948 by the United Nations in the Universal Declaration of Human Rights, Article
12. With the shifting of the Internet connectivity paradigm towards almost every object of everyday life, privacy will undeniably
become as crucial as ever. We motivate the importance of privacy protection with two distinct applications in the growing
technological fields of statistical disclosure control (SDC) and location-based services (LBSs), respectively, in this section. The next
section will offer a more technical review in the context of the state of the art.
Data & Knowledge Engineering 70 (2011) 892–921
☆ The material in this paper has been published in part in the proceedings of the 20th Tyrrhenian International Workshop on Digital Communications, Sardinia,
Italy, Sept. 2–4, 2009 [1].
⁎ Corresponding author. Tel.: + 34 93 401 7027.
E-mail addresses: david.rebollo@entel.upc.edu (D. Rebollo-Monedero), jforne@entel.upc.edu (J. Forné), soriano@entel.upc.edu, miquel.soriano@cttc.cat
(M. Soriano).
0169-023X/$ – see front matter © 2011 Elsevier B.V. All rights reserved.
doi:10.1016/j.datak.2011.06.005
Contents lists available at ScienceDirect
Data & Knowledge Engineering
journal homepage: www.elsevier.com/locate/datak