Fragmentation Design for Efficient Query Execution over
Sensitive Distributed Databases
Valentina Ciriani
*
, Sabrina De Capitani di Vimercati
*
, Sara Foresti
*
,
Sushil Jajodia
†
, Stefano Paraboschi
‡
, and Pierangela Samarati
*
*
DTI - University of Milan, 26013 Crema - Italy
Email: {ciriani,decapita,foresti,samarati}@dti.unmi.it
†
CSIS - George Mason University, Fairfax, VA 22030-4444
Email: jajodia@gmu.edu
‡
DIIMM - University of Bergamo, 24044 Dalmine - Italy
Email: parabosc@unibg.it
Abstract
The balance between privacy and utility is a classical
problem with an increasing impact on the design of modern
information systems. On the one side it is crucial to ensure
that sensitive information is properly protected; on the other
side, the impact of protection on the workload must be
limited as query efficiency and system performance remain
a primary requirement. We address this privacy/efficiency
balance proposing an approach that, starting from a flex-
ible definition of confidentiality constraints on a relational
schema, applies encryption on information in a parsimo-
nious way and mostly relies on fragmentation to protect
sensitive associations among attributes. Fragmentation is
guided by workload considerations so to minimize the cost of
executing queries over fragments. We discuss the minimiza-
tion problem when fragmenting data and provide a heuristic
approach to its solution.
1. Introduction
A medical organization manages a collection of data
recording the medical histories of a community of patients.
Researchers can then access these data and effectively
and efficiently discover behavioral and social patterns that
exhibit correlation with specific pathologies, with a direct
positive impact on medical research. The downside is that a
compromise of the server can disclose patients’ information
and violate their privacy. The owner of an e-commerce Web
site must store the complete description of the financial
data about transactions executed on the site. The Web site
offers a wider choice and lower prices than a brick-and-
mortar store, producing an immediate benefit to consumers
and a considerable positive economic impact. The downside
is that a compromise of the Web server may bring cus-
tomers’ data into the black market, where they can be used
in fraudulent transactions. The two scenarios demonstrate
that, while information and communication technology can
provide important benefits, they inevitably introduce risks
of exposing private information to improper disclosure. The
proposal in this paper aims at reducing the risks introduced
by the management of sensitive information.
The crucial observation behind our approach is that users
of the system may normally need to access the data in a
way that does not introduce risks. For instance, medical
researchers may typically need to access generic and not-
identifying patient data when performing their research. The
owner of the Web site mostly accesses the financial data
about the transactions managed by the Web site with no
need to reference the personal data of the customer. On
the other hand, medical researchers may sometimes need to
evaluate parameters that may lead to the specific identity of
the patient, and the Web site owner may need to retrieve the
complete credit card data when a dispute arises. In addition,
regulations are forcing requirements on the management of
personal information that often explicitly demand the use of
encryption for the protection of sensitive data.
A promising approach to protect sensitive data or sen-
sitive associations among data stored at external parties
is represented by the combined use of fragmentation and
encryption [4]. Fragmentation and encryption provide pro-
tection of data in storage, or when disseminated, ensuring
no sensitive information is disclosed neither directly (i.e.,
present in the database) nor indirectly (i.e., derivable from
other information in the database). With this design, the
data can be outsourced and stored on an untrusted server,
typically obtaining lower costs, greater availability, and more
efficient distributed access. This scenario resembles the
“database-as-a-service” (DAS) paradigm [3], [6] and indeed
the techniques presented in the paper can be considered an
adaptation of this paradigm to a context where only part
of the information stored into the database is confidential
and where the confidentiality of associations among values
is protected by storing them in separate fragments. The
advantage of having only part of the data encrypted is that all
the queries that do not require to reconstruct the confidential
© 2009 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for
advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or
lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.