Caching Strategy for Scalable Lookup of Personal Content Niels Sluijs, Tim Wauters, Bart De Vleeschauwer, Filip De Turck, Bart Dhoedt and Piet Demeester Department of Information Technology (INTEC) Ghent University – IBBT Gaston Crommenlaan 8, Bus 201, 9050 Ghent, Belgium {firstname.lastname}@intec.ugent.be Abstract—Today’s trend is to create and share personal content, such as music files, digital photos and digital movies. The result is an explosive growth of a user’s personal content archive. Managing such an often distributed collection becomes a complex and time consuming task, which indicates the need for a personal content management system that provides storage space transparently, is quality-aware, and is available at any time and at any place to end-users. A solution that fulfills this need is a Personal Content Storage Service (PCSS). A key feature of a PCSS is the ability to search worldwide through the dataset of personal files. Due to the extremely large size of the dataset of personal content, a centralized approach is no longer feasible; therefore the PCSS uses a structured peer-to-peer network: the Distributed Hash Table (DHT). In order to further increase the lookup performance, a caching layer is used between the application layer and the DHT. In this article we present the caching layer and introduce the Request Times Distance (RTD) caching algorithm, which uses popularity and distance metrics to increase the lookup performance. By extending the RTD algorithm with a sliding window and cooperative caching, a more efficient solution than standard algorithms is obtained. The cooperative RTD caching algorithm is evaluated using the PlanetSim simulation framework and shows a performance increase of up to 16% compared to the Least Frequently Used (LFU) caching algorithm. Keywords-caching strategy; distributed hash table; performance analysis; personal content; scalability I. INTRODUCTION An important trend today is to create and share personal content, such as text documents, digital photos, music files and personal movies, with others. End-users can share their personal content using websites such as YouTube 1 and Flickr 2 , for personal movies and digital photos respectively. Since the number of personal files grows enormously, managing a personal archive has become a complex and time consuming task. Nevertheless, end-users expect they can locate, control, access and share their personal content from any device, anywhere and at any time. However, current systems that provide storage for personal content, such as YouTube and Flickr, set limitations in order to cope with the workload. In many cases the file size is limited, restrictions are set on file formats or no possibility is provided to access 1 http://www.youtube.com/ 2 http://www.flickr.com/ personal content from different types of devices, such as desktop computers, laptops, PDAs (Personal Digital Agent) and mobile phones. This implies that these systems are not able to offer a real quality-aware and scalable solution for transparent storage of personal content. A networked solution that offers storage space to end- users in a transparent manner, from different types of devices and is able to cope with the expected workload is a Personal Content Storage Service (PCSS) [13]. In order to come to a successful deployment of a PCSS, research is needed for each different aspect, such as user centric security, content presence, content replica management and content indexing. An essential feature of a PCSS is the ability to search worldwide through the dataset of personal files, therefore this article presents the research on content indexing. A data structure that allows searching through extreme large datasets is a Distributed Hash Table (DHT). A DHT is a (structured) peer-to-peer network that offers scalable lookup, similar to a hash table. A <key, value>-pair is stored into the DHT and every node participating in the DHT is able to efficiently locate values that correspond to a certain key. In the case of a PCSS, the key is for instance a file name or represents tags/keywords that describe the personal file. A value in a PCSS is a link to the location the file is stored, for instance YouTube or Flickr. Different implementations of a DHT already exists, such as Chord [14] and Pastry [11]. A disadvantage of a DHT is that it only offers content lookup when the exact keyword is known. However, users want to be able to search through content using multiple keywords and range queries. In order to provide end-users the ability to search through the dataset of personal content, DHT architectures and algorithms have to be improved first. In this paper we focus on improving the performance of lookup requests in DHTs. An important aspect is that some keywords are more popular than others. Nodes responsible for popular keywords need to handle more requests than others, which results in so-called hotspots in the DHT. To reduce the hotspot problem and optimize the lookup performance we introduce a caching layer on top of a DHT. This article continues in Section II with an overview of related work, Section III provides an overview of the caching architecture and Section IV introduces the caching algorithm, the validation and evaluation of the caching algorithm is provided in Section V, and finally, we conclude this paper in Section VI. 2009 First International Conference on Advances in P2P Systems 978-0-7695-3831-0/09 $26.00 © 2009 IEEE DOI 10.1109/AP2PS.2009.11 19