Backup Aware Object based Storage Girish Moodalbail, Nagapramod Mandagere, Aravindan Raghuveer, Sunil Subramanya, David Du DTC Intelligent Storage Consortium (DISC) University of Minnesota Minneapolis, MN 55455 girishmg@cs.umn.edu, npramod@cs.umn.edu, aravind@cs.umn.edu, subram@cs.umn.edu, du@cs.umn.edu Abstract— Data management costs have skyrocketed due to increase in both the volume and complexity of storage solutions. Object based storage (OSD) is a new paradigm that has gained increasing acceptance due to ability of embedding intelligence in the storage devices thereby aiding in ease of management. Even in this age of fault tolerant system, data backup is still the only means of ensuring absolute recovery. Traditional backup solutions are limited by the features of block based storage. Due to the mechanism of data storage in OSD - Objects (data + metadata), it offers a whole new set of features that were not possible in traditional systems [1]. In this work we explore the opportunities and implications of OSD enabled backup. We propose and implement a new backup management system capable of exploiting the features of OSD. Salient features of our proposed design include - fine grained policy specification and intelligent, non-intrusive backup scheduling. Our prototype evaluation shows that our proposed backup solution outperforms traditional backup solutions both in the features it offers and the performance. Specifically, the non- intrusive backup scheduling completely minimizes the impact on real time data access. I. I NTRODUCTION One of the main concerns of enterprises today is the Total Cost of Ownership (TCO). In the data storage domain this boils down to Cost of Acquisition and Cost of Manage- ment/Administration. In recent times, complexity of systems has increased, requiring more skilled labour to manage and maintain these system leading directly to an unprecedented increase in Cost of Management. In the day to day working of any data management system, typical management tasks include performance monitoring, system health monitoring and information life-cycle management tasks like regular backups of data, ensuring archival of stale data, etc. Recent trend in both industry and research community has beens towards building Self Managing and Self Monitoring Storage Systems. The goal of these systems is to minimize human or administrator intervention during the day to day operations of the storage solution. This move towards adding more intelligence into the storage systems, is fueled not just by benefits of reduction in management cost but also by the potential performance benefits that were not available in traditional system. Building intelligent storage devices requires an in-depth understanding of the metadata that describes the stored data. Traditional block based storage systems do not have the capa- bility to interpret any metadata corresponding to the data that they store. This is in part because of the interface between the applications that access storage devices and the storage devices themselves and in part because of the way traditional systems store data. Object based Storage(OSD) is a new paradigm that can be used to build intelligent storage devices. In recent times, OSD has seen rapid adoption as a result of standardization of the Object Storage Interface by the T10 committee [2]. OSD revolves around the idea of storing additional metadata along with the data itself and providing an expressive interface for manipulating this metadata. Data is stored as objects with attributes representing the corresponding metadata. Data backups are critical piece of any Information Lifecycle Management (ILM) scheme. Backups are the only way of ensuring absolute fault tolerance. In large enterprises, one or more administrators make use of commercial enterprise class software to backup important data. Though the degree of au- tomation of this process has increased as backup management softwares have become more intelligent and complex, there is still a need for supervision of a skilled administrator. This can be attributed to two main factors, namely - Static policy specification and Scheduling of backup window. Policy specification is usually the job of the system adminis- trator and he/she specifies the policies at the backup server(s) as input to the Backup Management Software. Hence, it is up to the system administrator(s) to decide the importance of the data stored on the storage systems managed by them. In large enterprises, the volume of data is huge and hence the system administrator’s job of deciding which data needs to be backed up and when becomes very complicated. In addition, data importance varies over time rendering static policies ineffective. To aid the administrators, system users and administrators generally come to an agreement before hand on backup policies. For instance, one simple example could be that whatever files are present under /user-name/project directory of every user is backed twice daily. This makes enforcement of backup policy by the system administrator(s) less complex. But in doing so, the users are tied up to orga- nizing all their important data in specified locations only. And if the users do not exercise any caution and dump unimportant data in the above mentioned directories, the utilization of backup solution reduces. Another option that is commonly used is to take complete backup of the system irrespective of data importance. Though this approach simplifies the job of administration, the utilization of the backup solution is sub optimal. Backup is usually a resource hungry operation. Performing