Caching Strategy for Scalable Lookup of Personal Content
Niels Sluijs, Tim Wauters, Bart De Vleeschauwer, Filip De Turck, Bart Dhoedt and Piet Demeester
Department of Information Technology (INTEC)
Ghent University – IBBT
Gaston Crommenlaan 8, Bus 201, 9050 Ghent, Belgium
{firstname.lastname}@intec.ugent.be
Abstract—Today’s trend is to create and share personal
content, such as music files, digital photos and digital movies.
The result is an explosive growth of a user’s personal content
archive. Managing such an often distributed collection
becomes a complex and time consuming task, which indicates
the need for a personal content management system that
provides storage space transparently, is quality-aware, and is
available at any time and at any place to end-users. A solution
that fulfills this need is a Personal Content Storage Service
(PCSS). A key feature of a PCSS is the ability to search
worldwide through the dataset of personal files. Due to the
extremely large size of the dataset of personal content, a
centralized approach is no longer feasible; therefore the PCSS
uses a structured peer-to-peer network: the Distributed Hash
Table (DHT). In order to further increase the lookup
performance, a caching layer is used between the application
layer and the DHT. In this article we present the caching layer
and introduce the Request Times Distance (RTD) caching
algorithm, which uses popularity and distance metrics to
increase the lookup performance. By extending the RTD
algorithm with a sliding window and cooperative caching, a
more efficient solution than standard algorithms is obtained.
The cooperative RTD caching algorithm is evaluated using the
PlanetSim simulation framework and shows a performance
increase of up to 16% compared to the Least Frequently Used
(LFU) caching algorithm.
Keywords-caching strategy; distributed hash table;
performance analysis; personal content; scalability
I. INTRODUCTION
An important trend today is to create and share personal
content, such as text documents, digital photos, music files
and personal movies, with others. End-users can share their
personal content using websites such as YouTube
1
and
Flickr
2
, for personal movies and digital photos respectively.
Since the number of personal files grows enormously,
managing a personal archive has become a complex and time
consuming task. Nevertheless, end-users expect they can
locate, control, access and share their personal content from
any device, anywhere and at any time. However, current
systems that provide storage for personal content, such as
YouTube and Flickr, set limitations in order to cope with the
workload. In many cases the file size is limited, restrictions
are set on file formats or no possibility is provided to access
1
http://www.youtube.com/
2
http://www.flickr.com/
personal content from different types of devices, such as
desktop computers, laptops, PDAs (Personal Digital Agent)
and mobile phones. This implies that these systems are not
able to offer a real quality-aware and scalable solution for
transparent storage of personal content.
A networked solution that offers storage space to end-
users in a transparent manner, from different types of devices
and is able to cope with the expected workload is a Personal
Content Storage Service (PCSS) [13]. In order to come to a
successful deployment of a PCSS, research is needed for
each different aspect, such as user centric security, content
presence, content replica management and content indexing.
An essential feature of a PCSS is the ability to search
worldwide through the dataset of personal files, therefore
this article presents the research on content indexing.
A data structure that allows searching through extreme
large datasets is a Distributed Hash Table (DHT). A DHT is
a (structured) peer-to-peer network that offers scalable
lookup, similar to a hash table. A <key, value>-pair is stored
into the DHT and every node participating in the DHT is
able to efficiently locate values that correspond to a certain
key. In the case of a PCSS, the key is for instance a file name
or represents tags/keywords that describe the personal file. A
value in a PCSS is a link to the location the file is stored, for
instance YouTube or Flickr. Different implementations of a
DHT already exists, such as Chord [14] and Pastry [11].
A disadvantage of a DHT is that it only offers content
lookup when the exact keyword is known. However, users
want to be able to search through content using multiple
keywords and range queries. In order to provide end-users
the ability to search through the dataset of personal content,
DHT architectures and algorithms have to be improved first.
In this paper we focus on improving the performance of
lookup requests in DHTs. An important aspect is that some
keywords are more popular than others. Nodes responsible
for popular keywords need to handle more requests than
others, which results in so-called hotspots in the DHT. To
reduce the hotspot problem and optimize the lookup
performance we introduce a caching layer on top of a DHT.
This article continues in Section II with an overview of
related work, Section III provides an overview of the caching
architecture and Section IV introduces the caching algorithm,
the validation and evaluation of the caching algorithm is
provided in Section V, and finally, we conclude this paper in
Section VI.
2009 First International Conference on Advances in P2P Systems
978-0-7695-3831-0/09 $26.00 © 2009 IEEE
DOI 10.1109/AP2PS.2009.11
19