Reliability Analysis of An Energy-Aware RAID System Shu Yin, Yun Tian, Jiong Xie, and Xiao Qin Department of Computer Science and Software Engineering Auburn University, Auburn, AL 36849 Email: {szy0004, tianyun, jzx0009, and xqin}@auburn.edu Mohammed Alghamdi Department of Computer Science Al-Baha University, Al-Baha City, Kingdom of Saudi Arabia Email: mialmushilah@bu.edu.sa Xiaojun Ruan Department of Computer Science West Chester University of Pennsylvania, West Chester, PA 19383 Email: xruan@wcupa.edu Meikang Qiu Department of Electrical and Computer Engineering University of Kentucky, Lexington, Kentucky, 40506 Email: mqiu@engr.uky.edu Abstract—We develop a mathematical model– MREED– to quantita- tively evaluate the failure rate of energy-efficient parallel storage systems. The Power-Aware Redundant Array of Inexpensive Disk (PARAID) aims to reduce energy use of commodity server-class disks without specialized hardware. The goal of PARAID is to skewed striping pattern to adapt to the system load by changing the number of powered disks. By spinning down disks during light workloads, PARAID can reduce power consumption, while still meeting performance demands. We show that MREED can be used to estimate a five-disk PARAID-0 system. We validate the accuracy of MREED using the DiskSim simulator. Our approach shows that MREED can rely on file access pattern to estimate system utilization correctly. Furthermore, even thought PARAID may achieve reasonable reliability, our model shows that PARAID’s reliability is affected by data locality. Keywords-Parallel storage system, RAID, energy-efficient, reliability I. I NTRODUCTION Existing reliability models for conventional parallel and distributed disk systems do not consider energy-saving issues or data-stripping mechanisms. In this paper, we first study the reliability of a parallel disk system equipped with the PARAID [1] technique by employing the M athematical R eliability model for E nergy- E fficient RAID system called MREED. As a mathematical model, MREED shows its advantage of presenting the reliability trend of energy-aware storage systems. However, it is challenging to validate the MREED model. To address the correctness issue of MREED, we validate the access- rate-utilization model, which converts file access rate to utilization of the storage system, in MREED. Finally, we study impacts of the I/O load skewing technique –gear shifting – on the reliability of PARAID, a well known energy-aware data stripping storage system. Existing energy conservation techniques can yield significant en- ergy savings in disks. While several energy conservation schemes like cache-based energy-saving approaches normally have marginal impact on disk reliability, many energy-saving schemes (e.g., dynamic power management and workload skew techniques) inevitably have noticeable adverse impacts on storage systems [2][3]. For example, dynamic power management (DPM) techniques save energy by using frequent disk spin-downs and spin-ups, which in turn can shorten disk lifetime [4][5][6], redundancy techniques [7][8][9][10], workload skew [11][12][13], and multi-speed settings [14][15]. We pay attention on the reliability issue of RAID systems, existing energy conservation techniques can not be applied for RAID systems for the following reasons: Conventional RAIDs balance I/O load across all disks in the array for maximized disk parallelisms and performance, meaning that all disks are spinning even under a light load. No opportu- nity is offered to spin down any of disks; Server class disks are not designed for frequent power cycles, which significantly reduce life expectancy; Server systems cannot rely on caching and dynamic power management because the servers are too busy to have long idle time. In this paper, our contributions are summaries as follows: 1) We propose a reliability model MREED for Power-Aware RAID (i.e., an energy aware data-stripping parallel storage system); 2) We introduce Weibull distribution analysis to MREED. Using the utilization of a storage system as an input, we can estimate and forecast the annual failure rate (a.k.a, AFR) of this system; 3) We validate the access-rate-utilization model of MREED; 4) We study the impacts of the gear-shifting schemes on the reliability of PARAID. We study impacts of the I/O load skewing technique especially on PARAID-0, which is an energy-aware RAID-0 system. Experimental results shows that gear-shifting affects reliability of parallel disks due to two reasons: First, disks working at all gears tend to have high I/O utilization than disks that only works at high gears. Second, disks with high utilization are likely to have high risk of breaking down. The remainder of this paper is organized as follows. Section II presents the overview of the MREED model. In Section III, we apply MREED model to quantitatively estimate the reliability of PARAID. Secion IV demonstrates a solution to validate access- rate-utilization model in MREED. Section V presents experimental results and performance evaluation. In Section VI, the related work is discussed. Finally, Section VII concludes the paper with discussions. II. THE MREED MODELING FRAMEWORK A. Overview MREED is a framework developed to model reliability of paral- lel disk systems employing energy conservation techniques. In the MREED framework, we evaluate the reliability impacts of a specific energy-saving technique - the Power-Aware RAID. One critical module in MREED is to model the impact of energy-efficient schemes on the utilization and power-state transition frequency of each disk in a parallel disk system. Another important module developed in MREED is to calculate the annual failure rate of each disk as a function of the disk’s utilization, power-state transition frequency. Given the annual failure rate of each disk in the parallel disk system, MREED is able to derive the reliability of an energy-efficient parallel disk system. As such, we used MREED to study the reliability of a parallel disk system equipped with the PARAID technique. 978-1-4673-0012-4/11/$26.00 ©2011 IEEE