27th International Conference on Software Engineering and Knowledge Engineering A Reliable and Secure Cloud Storage Schema Using Multiple Service Providers Haiping Xu and Deepti Bhalerao Computer and Information Science Department University of Massachusetts Dartmouth, North Dartmouth, MA 02747, USA {hxu, dbhalerao}@umassd.edu Abstract—Despite the many advantages provided by cloud-based storage services, there are still major concerns such as security, reliability and confidentiality of data stored in the cloud. In this paper, we propose a reliable and secure cloud storage schema using multiple service providers. Different from existing approaches to achieving data reliability using redundancy at the server side, we propose a reliable and secure cloud storage schema that can be implemented at the client side. In our approach, we view multiple cloud-based storage services as virtual independent disks for storing redundant data encoded using erasure codes. Since each independent cloud service provider has no access to a user’s complete data, the data stored in the cloud would not be easily compromised. Furthermore, the failure or disconnection of a service provider will not result in the loss of a user’s data as the missing data pieces can be readily recovered. To demonstrate the feasibility of our approach, we developed a prototype cloud-based storage system that breaks a data file into multiple data pieces, generates an optimal number of checksum pieces, and uploads them into multiple cloud storages. Upon the failure of a cloud storage service, the application can quickly restore the original data file from the available pieces of data. The experimental results show that our approach is not only secure and fault-tolerant, but also very efficient due to concurrent data processing. Keywords-Cloud storage; reliability; data security; erasure codes; cloud service provider; integer linear programming. I. INTRODUCTION As an ever-growing data storage solution, cloud-based storage services have become a highly practical way for both people and businesses to store their data online. The pay-as- per-use model of cloud computing eliminates the upfront commitment from cloud users; thereby it allows users to start small businesses quickly, and increase resources only when they are needed. However, since data storage locations and security measures at the server site are typically unknown, most of the users have not yet become comfortable with exploiting the full potential of the cloud. Many incidents happened recently have made users question the reliability of cloud storage services. For example, in May 2014, Adobe’s ID service went down, leaving Creative Cloud users locked out of their software and account for over 24 hours [1]. In early 2013, Dropbox service had a major cloud outage that kept users offline and unable to synchronize using their desktop apps for more than 15 hours [2]. Prolonged cloud data service outages and security concerns can be fatal for businesses with data critical domains such as healthcare, banking and finance. Today, almost all the cloud service providers (CSP) have implemented fault-tolerant mechanisms at their server sides to recover original data from service failure or data corruption. Such mechanisms are suitable at the time of scheduled maintenance or for a small number of hard disk failures. However, they are of no use for the end users to ensure the reliability and security of their cloud data when major cloud services fail or the cloud services have been compromised. Hence, to achieve high reliability and security of critical data, users should not depend upon a single cloud service provider. In this paper, we propose an approach that can provide security and fault tolerance to the user’s data from the client side. In our approach, we decompose an original data file into multiple data pieces, and generate checksum pieces using erasure codes [3]. The pieces of data are spread across multiple cloud services, which can be retrieved and combined to recover the original file. We achieve data redundancy in our approach using erasure codes at the software level across multiple cloud service providers. Therefore, the original data can be recovered even when there is a cloud outage where some cloud service fails completely. Using this approach, user’s data would not be easily compromised by unauthorized access and security breach, as no single cloud service has the complete knowledge of user’s data. Thus, users could have the sole control of their cloud data, and do not need to rely on the security measures provided by cloud service providers. Finally, to improve the network performance of our approach, we adopt the multithreading technology, and fully utilize the network bandwidth in order to minimize the time required to access data over the cloud. There have been many research efforts on using erasure codes at the server side to make cloud storage service reliable. Huang et al. proposed to use erasure codes in Windows Azure storage [4]. They introduced a new set of codes for erasure codes called Local Reconstruction Codes (LRC) that could reduce the number of erasure coding fragments required for data reconstruction. Gomez et al. introduced a novel persistency technique that leverages erasure codes to save data in a reliable fashion in Infrastructure as a Service (IaaS) clouds [5]. They presented a scalable erasure coding algorithm that could support a high degree of reliability for local storage with the cost of low computational overhead and a minimal amount of communication. Khan et al. provided guidance for deploying erasure coding in cloud file systems to support load balance and incremental scalability in data centers [6]. Their proposed approach can prevent correlated failures with data (DOI Reference Number: 10.18293/SEKE2015-045) 116