International Journal of Enhanced Research in Management & Computer Applications, ISSN: 2319-7471 Vol. 2 Issue 9, Nov.-Dec., 2013, pp: (30-35), Available online at: www.erpublications.com Page | 30 Reinforcement Learning Approach for Data Migration in Hierarchical Storage Systems T.G. Lakshmi, R.R. Sedamkar, Harshali Patil Department of Computer Engineering, Thakur College of Engineering and Technology, Kandivali, Mumbai, India Abstract: There are several shortcomings in the existing data migration techniques in Hierarchical Storage Systems (HSS). The first and the most important among them is that data migration policies are user defined - hence static and reactive. Secondly, data migration at the host side is not yet completely explored. The other major drawbacks are that each storage tier is modelled as an agent; the data migration methodology is I/O triggered and the tier cost represented as a complex fuzzy rule base (FRB). This paper proposes a simple and single data migration agent in the HSS. The data migration agent will be a standalone daemon which implements the Reinforcement Learning (RL) algorithm. The agent will formulate and tune policies based on which the data migration will take place. The proposed model in this paper aims to achieve comparable results with existing systems in data migration, input/output queue length and response time of storage tiers. Keywords: HSS, Data Migration, Reinforcement Learning. 1. INTRODUCTION Businesses are having a very difficult time in maintaining the data center infrastructure. With the rate of generation of data growing annually by 55 %, the cost of (1) acquisition of new disk arrays, (2) backup, (3) floor space, (4) electricity, is growing exponentially. Data Storage Management is a difficult problem as data should be placed on resources which are able to fulfill user‟s request. It is also necessary to optimize the usage of resources and minimize the cost of their usage [1]. Traditional solution is to buy more disk space, or to store the data on external media and retrieve it as required [2]. Hierarchical Storage Management (HSM) provides an automatic way of managing and distributing data between the different storage layers to meet user needs for accessing data while minimizing the overall cost [3]. Hierarchical Storage Management (HSM) is used to effectively utilize the storage space according to its capability and also meet customer demands [3]. The data placement in HSM is done to minimize the cost of data storage while maximizing accessibility. In HSM the storage devices are separated into tiers based on their capacity and speed i.e. TIER I consists of disks that have a very high I/O speed and are hence expensive. The frequently accessed data is placed on higher tiers and the least frequently accessed data on lower tiers. Thus in essence data lives where it is more efficient. This gives rise to significant savings and revenue opportunities to the business. As we see in Fig 1, the fastest storage components are the most expensive. To maximize the return on investment (ROI) on the computing resource investment we need to utilize the resources to their best capability. The data arrangement is not static. Based on the current system‟s needs the data is moved between the tiers. The challenge is to find the right placement of the data. Figure 1: Cost and Speed of Access of different storage components