An Efficient Access Forwarding Method Based On Caches On Storage Nodes Dai Kobayashi 1 , Akitsugu Watanabe 1 , Ryo Taguchi 2 , Toshihiro Uehara 2 and Haruo Yokota 3,1 1 Department of Computer Science, Tokyo Institute of Technology, JAPAN {daik,aki}@de.cs.titech.ac.jp 2 Science and Technical Research Laboratories, Japan Broadcasting Corporation, JAPAN {taguchi.r-cs,uehara.t-jy}@nhk.or.jp 3 Global Scientific Information and Computing Center, Tokyo Institute of Technology, JAPAN yokota@cs.titech.ac.jp Abstract In this paper, we discuss access forwarding schemes for the replication that achieve balanced access load distribu- tion for data replicas on multiple storage nodes. In parallel storage systems, it is significant to handle skew of access- request distribution. Although replication is commonly used to solve such problems, it decreases hit ratios of cache mem- ories on each storage nodes. We first obtain a result that it uses cache memories efficiently to forward accesses that target at data in less demand with a simple experiment. We also propose a method that uses hit ratios of each cache to recognize the popularity of data with low costs. Results of experiments with the method show that it helps the replica- tion use the limited cache space efficiently. 1 Introduction Current large-scale storage systems consist of a lot of storage nodes with connecting over networks. Recently, to compose a storage node of CPU for data transfer and data management, RAM for caching, and a few disks, catches researchers’ and venders’ attention [1, 2, 9]. In such storage systems, it is significant to handle skew of access-request distribution. Or else, saturated I/O such as disks or networks of partial storage nodes make a system difficult to guarantee its performance. Replication is a method to solve such kind of problems used in existing storage management. Large-scale storage systems deploy not erasure code but replicas of data in mul- tiple storage nodes for both access distribution and redun- dancy. Erasure coding is tend to be avoided because of its high recovery costs [1]. Distributing accesses to multiple nodes is achieved by distributing each access to replicas on multiple nodes. In addition, dynamical variations of access distributing rate can make a system adaptable for varying access trend with time [8]. The method, however, tend to decrease the hit ratios of cache memories on each storage nodes because both its pri- mary and replica data of other nodes may be stored in a same cache and compete for the cache space. Then it is significant to consider how to keep the degra- dation of the cache hit ratio to a minimum. In this paper, we first consider the relationship between the popularity of data for access forwarding and hit ratios of caches with a simple experiment, and use cache memories efficiently to forward accesses for data in less demand. We also propose a method using hit ratios of each cache to recognize the popularity of data with low costs. Results of the experiment with the method show that it helps the replication use cache efficiently. The reminder of this paper is organized as follows. We describes related work in Section 2, and the assumptions of the paper in Section 3. Section 4 discusses relationship between cache hit ratio and data popularity preliminary ex- periments. Section 5 describes the method we propose and Section 6 reports experiments to show its efficiency. 2 Related Work D-SPTF [6] and GoogleFS [3] are storage systems that consist of high-functional storage nodes and put replicas of data on several nodes for both high reliability and work- load distribution. DASD dancing [8] and Chained Declus- tering [4] also uses replicas and, in addition, they achieve a dynamic workload distribution, varying access distribut- ing rate between replicas and moving workload logically follow the chain of replica-location to other node logically. D-SPTF [6] achieves high performance on a system using