A Query-Adaptive Partial Distributed Hash Table for Peer-to-Peer Systems Fabius Klemm, Anwitaman Datta, Karl Aberer School of Computer and Communication Sciences Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland {Fabius.Klemm, Anwitaman.Datta, Karl.Aberer}@epfl.ch Abstract. The two main approaches to find data in peer-to-peer (P2P) systems are unstructured networks using flooding and structured networks using a dis- tributed index. A distributed index is usually built over all keys that are stored in the network whether they are queried or not. Indexing all keys is no longer feasible when indexing metadata, as the key space becomes very large. Here we need a query-adaptive approach that indexes only keys worth indexing, i.e. keys that are queried at least with a certain frequency. In this paper we study the cost of indexing and propose a query-adaptive partial distributed hash table (PDHT) that does not keep all keys in the index. We model and analyze a scenario to show that query-adaptive partial indexing outperforms pure flooding and “in- dex-everything” strategies. Furthermore, our scheme is able to automatically adjust the index to changing query frequencies and distributions. Keywords: peer-to-peer (P2P), partial distributed hash table (PDHT), query- adaptive indexing, metadata. 1 Introduction There have been several proposals to store and retrieve data in decentralized unreli- able peer-to-peer networks. In most of the solutions the two alternatives so far have been to index all or nothing. In unstructured networks, such as Gnutella, peers use flooding or multiple random walks [ChRa03, LvCa02] to resolve queries and do not build and maintain any index. These mechanisms can be used for arbitrary, complex search requests on metadata as they are not restricted to certain keys to find values in the network. On the other hand queries generate a large number of messages. In structured peer-to-peer networks [Aber01, RaFr01, RoDr01, StMo01], also called distributed hash tables (DHTs), peers collaborate to construct and maintain a distrib- uted index, which allows very efficient searches, but are, however, restricted to searches on the indexed keys [HaHe02]. Moreover, traditional DHTs do not consider the query distribution and devote equal resources to all keys. Such drawbacks of DHTs are discussed in [ChRa03].