Database Management as a Service: Challenges and
Opportunities
Divyakant Agrawal
#1
, Amr El Abbadi
#2
, Fatih Emekci
*3
Ahmed Metwally
@4
#
Department of Computer Science, University of California at Santa Barbara
Santa Barbara, CA 93106, USA
1
agrawal@cs.ucsb.edu
2
amr@cs.ucsb.edu
∗
LinkedIn Corporation
2029 Stierlin Court, Mountain View, CA 94043, USA
3
fatihemekci@gmail.com
@
Google Inc.
1600 Amphitheatre Parkway, Mountain View, CA 94043, USA
4
ametwally@gmail.com
Abstract— Data outsourcing or database as a service is a new
paradigm for data management in which a third party service
provider hosts a database as a service. The service provides data
management for its customers and thus obviates the need for the
service user to purchase expensive hardware and software, deal
with software upgrades and hire professionals for administrative
and maintenance tasks. Since using an external database service
promises reliable data storage at a low cost it is very attractive for
companies. Such a service would also provide universal access,
through the Internet to private data stored at reliable and secure
sites. However, recent governmental legislations, competition
among companies, and database thefts mandate companies to
use secure and privacy preserving data management techniques.
The data provider, therefore, needs to guarantee that the data
is secure, be able to execute queries on the data, and the results
of the queries must also be secure and not visible to the data
provider. Current research has been focused only on how to
index and query encrypted data. However, querying encrypted
data is computationally very expensive. Providing an efficient
trust mechanism to push both database service providers and
clients to behave honestly has emerged as one of the most
important problem before data outsourcing to become a viable
paradigm. In this paper, we describe scalable privacy preserving
algorithms for data outsourcing. Instead of encryption, which is
computationally expensive, we use distribution on multiple data
provider sites and information theoretically proven secret sharing
algorithms as the basis for privacy preserving outsourcing. The
technical contributions of this paper is the establishment and
development of a framework for efficient fault-tolerant scalable
and theoretically secure privacy preserving data outsourcing that
supports a diversity of database operations executed on different
types of data, which can even leverage publicly available data
sets.
I. I NTRODUCTION
Internet-scale computing has resulted in dramatic changes
in the design and deployment of information technology in-
frastructure components. Cloud computing has been gaining in
popularity in the commercial world, where various computing
based capabilities are provided as a service to clients, thus
relieving those clients from the need to develop expertise
in these capabilities, as well as the need to manage and
maintain the software providing these services. Amazon, for
example, has created the service EC2, which provides clients
with scalable servers; as well as another service S3, which
provides scalable storage to clients. Recently, NSF partnered
with Google and IBM to offer academic institutions access
to large scale distributed infrastructure under the NSF CLuE
program. There has clearly been a radical paradigm shift due
to the wide acceptance of and reliance on Internet and Web-
based technologies.
One of the reasons for the success of Internet-scale com-
puting is the role it has played in eliminating the size of
an enterprise as a critical factor in its economic success.
An excellent example of this change is the notion of data
centers which provide clients with the physical infrastructure
needed to host their computer systems, including redundant
power supplies, high bandwidth communication capabilities,
environment monitoring, and security services. Data centers
eliminate the need for small companies to make a large capital
expenditure in building an infrastructure to create a global
customer base. The data center model has been effective since
it allows an enterprise of any size to manage growth with the
popularity of its product or service while at the same time also
allows the enterprise to cut its losses if the launched product
or service does not succeed. During the past few years we
have seen a rapid acceleration of innovation in new business
paradigms and data centers have played a very important role
in this process.
In addition to the physical infrastructure needed to support
Internet and web-based applications, such applications have
data management needs as well. To enable more sophisti-
cated business analysis and user customization, e-commerce
applications maintain data or log information for every user
interaction rather than only storing transaction data (e.g. sales
transactions in the retail industry). This trend has resulted
in an explosive growth in the amount of data associated
IEEE International Conference on Data Engineering
1084-4627/09 $25.00 © 2009 IEEE
DOI 10.1109/ICDE.2009.151
1709
IEEE International Conference on Data Engineering
1084-4627/09 $25.00 © 2009 IEEE
DOI 10.1109/ICDE.2009.151
1709