Database Management as a Service: Challenges and Opportunities Divyakant Agrawal #1 , Amr El Abbadi #2 , Fatih Emekci *3 Ahmed Metwally @4 # Department of Computer Science, University of California at Santa Barbara Santa Barbara, CA 93106, USA 1 agrawal@cs.ucsb.edu 2 amr@cs.ucsb.edu LinkedIn Corporation 2029 Stierlin Court, Mountain View, CA 94043, USA 3 fatihemekci@gmail.com @ Google Inc. 1600 Amphitheatre Parkway, Mountain View, CA 94043, USA 4 ametwally@gmail.com Abstract— Data outsourcing or database as a service is a new paradigm for data management in which a third party service provider hosts a database as a service. The service provides data management for its customers and thus obviates the need for the service user to purchase expensive hardware and software, deal with software upgrades and hire professionals for administrative and maintenance tasks. Since using an external database service promises reliable data storage at a low cost it is very attractive for companies. Such a service would also provide universal access, through the Internet to private data stored at reliable and secure sites. However, recent governmental legislations, competition among companies, and database thefts mandate companies to use secure and privacy preserving data management techniques. The data provider, therefore, needs to guarantee that the data is secure, be able to execute queries on the data, and the results of the queries must also be secure and not visible to the data provider. Current research has been focused only on how to index and query encrypted data. However, querying encrypted data is computationally very expensive. Providing an efficient trust mechanism to push both database service providers and clients to behave honestly has emerged as one of the most important problem before data outsourcing to become a viable paradigm. In this paper, we describe scalable privacy preserving algorithms for data outsourcing. Instead of encryption, which is computationally expensive, we use distribution on multiple data provider sites and information theoretically proven secret sharing algorithms as the basis for privacy preserving outsourcing. The technical contributions of this paper is the establishment and development of a framework for efficient fault-tolerant scalable and theoretically secure privacy preserving data outsourcing that supports a diversity of database operations executed on different types of data, which can even leverage publicly available data sets. I. I NTRODUCTION Internet-scale computing has resulted in dramatic changes in the design and deployment of information technology in- frastructure components. Cloud computing has been gaining in popularity in the commercial world, where various computing based capabilities are provided as a service to clients, thus relieving those clients from the need to develop expertise in these capabilities, as well as the need to manage and maintain the software providing these services. Amazon, for example, has created the service EC2, which provides clients with scalable servers; as well as another service S3, which provides scalable storage to clients. Recently, NSF partnered with Google and IBM to offer academic institutions access to large scale distributed infrastructure under the NSF CLuE program. There has clearly been a radical paradigm shift due to the wide acceptance of and reliance on Internet and Web- based technologies. One of the reasons for the success of Internet-scale com- puting is the role it has played in eliminating the size of an enterprise as a critical factor in its economic success. An excellent example of this change is the notion of data centers which provide clients with the physical infrastructure needed to host their computer systems, including redundant power supplies, high bandwidth communication capabilities, environment monitoring, and security services. Data centers eliminate the need for small companies to make a large capital expenditure in building an infrastructure to create a global customer base. The data center model has been effective since it allows an enterprise of any size to manage growth with the popularity of its product or service while at the same time also allows the enterprise to cut its losses if the launched product or service does not succeed. During the past few years we have seen a rapid acceleration of innovation in new business paradigms and data centers have played a very important role in this process. In addition to the physical infrastructure needed to support Internet and web-based applications, such applications have data management needs as well. To enable more sophisti- cated business analysis and user customization, e-commerce applications maintain data or log information for every user interaction rather than only storing transaction data (e.g. sales transactions in the retail industry). This trend has resulted in an explosive growth in the amount of data associated IEEE International Conference on Data Engineering 1084-4627/09 $25.00 © 2009 IEEE DOI 10.1109/ICDE.2009.151 1709 IEEE International Conference on Data Engineering 1084-4627/09 $25.00 © 2009 IEEE DOI 10.1109/ICDE.2009.151 1709