Data On-boarding in Federated Storage Clouds Gil Vernik , Alexandra Shulman-Peleg , Sebastian Dippl , Ciro Formisano , Michael C. Jaeger , Elliot K. Kolodner , Massimo Villari § IBM Haifa Research Lab Siemens Corporate Research and Technologies Engineering Ingegneria Informatica SPA § University of Messina Abstract—One of the main obstacles hindering wider adoption of storage cloud services is vendor lock-in, a situation in which large amounts of data that are placed in one storage system can not be migrated to another vendor, e.g., due to time and cost considerations. To prevent this situation we present an advanced on-boarding federation mechanism, enabling a cloud to add a special federation layer to efficiently import data from other storage clouds. This is achieved without being dependent on any special function from the other clouds. We design a generic, modular on-boarding architecture and demonstrate its implementation as part of a VISION Cloud, which is a large scale storage cloud designed for content-centric data. Our system is capable of integrating storage data from various clouds, providing a common global view of storage data. The users can access the data through the new cloud provider immediately after the setup, maintaining the normal operation of applications, so that they do not need to wait for the completion of the data migration process. Finally, we analyze the payment models of existing storage clouds, showing that transferring the data via on-boarding federation with a direct link between clouds can lead to significant time and cost savings. 1I NTRODUCTION Cloud platforms should fulfill the requirements for scalabil- ity and flexibility, allowing rapidly redeploying and moving resources. This is achievable for compute resources, but it is not common practice for storage. Existing storage clouds still do not allow true data mobility and cannot easily migrate their data across providers. The work “Above the Clouds” of Armbrust et al. [1], named the problem of “vendor lock-in” of the stored data to be the second among top ten obstacles for growth in Cloud Computing. The authors named the lack of standardized storage access APIs as one reason. Today, the Cloud Data Management Interface standard from the SNIA (CDMI [2]) exists. However, it lacks adoption by the larger cloud storage providers. In addition, applications generate so much data today that the resulting transfer time could require longer interruptions to services. There are companies and products that have specialized in moving data (e.g., Nasuni, Racemi, Gladinet) or providing The research leading to these results has received funding from the European Community’s Seventh Framework Programme (FP7/2007-2013) under grant agreement number 257019. IBM, the IBM logo, and ibm.com are trademarks or registered trade- marks of International Business Machines Corp., registered in many jurisdic- tions worldwide. Other product and service names might be trademarks of IBM or other companies. a common view of data (e.g., IBM CastIron). The main business case for such products is either the migration from a classic IT infrastructure to a cloud offering, or the prevention of vendor lock-in. However, third party tools cannot fully leverage the underlying storage cloud platforms for faster and more transparent migration. Our goal is to prevent vendor lock-in by introducing a special federation layer as part of the storage cloud infrastruc- ture. Our approach covers three areas: (1) standard API and interoperability; (2) efficient and transparent data migration; and (3) system security and access control. To cover the first issue, we adhere to the CDMI standard, allowing for interoperability between CDMI-compliant storage providers. Second, we introduce the concept of on-boarding federation, allowing an enterprise to move its data from one storage cloud provider to another (e.g., due to economical, legal or functional considerations) while providing continuous access and a unified view over the data in the old cloud and the new cloud, and over data in transit. Our approach forms a federation between the clouds. The data is migrated by a background process without interrupting services. The third area is the access control architecture, which targets the fed- eration of two autonomous access control systems protecting the data in the two clouds. It is important to note that we ensure data consistency and completeness without introducing any centralized components or requiring any modifications to the old cloud. This preserves the benefits of distribution and scalability, and also makes our architecture suitable for future deployments with other storage cloud systems. Here, we present an implementation over a VISION Cloud [3] system, which is an EU-funded project for a scal- able and federated storage system providing content-centric access to its storage. In contrast to public cloud offerings such as Amazon S3 and Windows Azure Blob Storage, or specific hardware appliances, VISION Cloud stresses support for rich metadata as an integral part of the storage. As part of this project, partners also develop use case applications for telecommunications services, media production systems, healthcare services and enterprise applications. The work is organized as follows. In Section 2, we de- scribe our federation architecture and its implementation over the VISION Cloud. Section 3 discusses the security issues, comparing several access control models; Section 4 describes their implementation. Section 5 presents the overhead of fed- 2013 IEEE Sixth International Conference on Cloud Computing 978-0-7695-5028-2/13 $26.00 © 2013 IEEE DOI 10.1109/CLOUD.2013.54 244