Resource Discovery Approach to Support a QoS-aware DHT-based Caching Architecture David Castro, V´ ıctor M. Gul´ ıas, Henrique Ferreiro and Carlos Abalde MADS Group, Computer Science Department University of A Coru˜ na, A Coru˜ na, Spain Email: {dcastrop, gulias, hferreiro, cabalde}@udc.es Abstract—Though widely accepted as a key building block for next-generation large scale decentralized systems, the lack of flexibility of DHTs on efficient non identifier-based lookups is a well-known problem. In this paper, a resource discovery service that tackles the issues identified in a decentralized and distributed DHT-based caching architecture for media content distribution is presented. In the proposed approach, complex queries on dynamic peer resources can be performed. This discovery service is layered over an underlying DHT overlay network which connects all the peers, based on the combination of a spanning- tree built bottom-up by mapping DHT peers to their parents, and routing indices which allow peers to efficiently lookup other peers, matching some resource constraints. Measures from simulation as well as from a real implementation are analyzed. Index Terms—resource discovery; P2P; DHT; content distri- bution. I. I NTRODUCTION Much recent work on designing scalable and decentralized large scale distributed behaviors has focused on distributed hash tables (DHTs). Nowadays, DHTs are a powerful building block for the design of distributed systems which offer a number of well-known appealing advantages over previous peer-to-peer (P2P) unstructured architectures like Napster [1] or Gnutella [2]. Though widely accepted as a key building block for next- generation large scale decentralized systems, the lack of flexibility of DHTs on efficient non identifier-based lookups is a well-known problem. Resource discovery techniques on DHTs address this issue and, in particular, the lookups on the DHT structure itself, rather than just in the data stored into it. A resource discovery service has to find peers holding available changing resources which match some user-defined criteria (query). Depending on the service, the expressiveness of those queries ranges from single-resource exact-match queries, multi-attribute range queries or even arbitrary queries (semantic search). In this paper, we deal with the resource discovery problems motivated by the particular QoS-related requirements of a de- centralized and distributed caching architecture for multimedia content distribution. In this case, multiple dynamic related information about resource usage on peers (network and disk bandwidth, CPU usage, disk occupation, media content and foreseen availability, . . . ) is required by a distributed scheduler to properly balance streaming, pre-fetching and inter-node copying of media. Our proposed approach, an evolution of the one presented in [3], is layered over an underlying DHT overlay network which connects all the peers and it is based on the combination of a spanning-tree built bottom-up by mapping DHT peers to their parents, and routing indices which allow peers to efficiently lookup other peers in the DHT overlay network, matching some resource constraints. This DHT-agnostic re- source discovery approach (i) does not alter the underlying DHT behavior, (ii) scales to large wide area systems, (iii) relies neither on centralized indexes nor on super-peers, (iv) tracks both relatively static and frequently changing resources, (v) supplies a complete language to express aggregation, exact- match and range multi-attribute queries, (vi) it provides an extensive set of user-defined parameters to adapt its behavior to different environments, and (vii) it is flexible enough to adapt to multi-administrative domain environments. In addition to simulation results, a real implementation has been developed, deployed and benchmarked in order to better understand the behavior of the proposed algorithm. The remainder of this paper is organized as follows. Sec- tion II presents an overview of the scenario that motivates this work and sketches the proposed algorithm. Section III shows measurements for both simulation and a real deployment of the resource discovery service. A brief introduction to relevant state of the art, focusing on DHT-based techniques, and its relation with our work is presented in Section IV. Finally, we conclude. II. RESOURCE DISCOVERY SERVICE DESCRIPTION What follows is a description of the system and its motiva- tion. A. Motivation In [4], a distributed video-on-demand server architecture is presented. In order to achieve a large aggregated throughput and storage capacity, media content is distributed across a network of peers structured into several distribution levels, struggling to get media closer to the final users. Intermediate levels act as a multi-level distributed content cache, exploiting the intrinsic nature and the locality of video distribution. That includes both temporal locality (popular media, such as 2009 First International Conference on Emerging Network Intelligence 978-0-7695-3835-8/09 $26.00 © 2009 IEEE DOI 10.1109/EMERGING.2009.19 21