IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 16, NO. 1, FIRST QUARTER 2014 473 Survey on Load Balancing in Peer-to-Peer Distributed Hash Tables Pascal Felber, Member, IEEE, Peter Kropf, Member, IEEE, Eryk Schiller, and Sabina Serbu Abstract—Peer-to-peer systems represent a radical shift from the classical client-server paradigm in which a centralized server processes requests from all clients. In a peer-to-peer (P2P) system, every “peer” can play the role of a client and a server at the same time, hence sharing responsibilities among all parties. As in practice some peers or connecting links may be heavily loaded in comparison to others, load balancing algorithms are necessary to ensure a fair distribution of the load among participating peers. In this survey, we present load management solutions in P2P systems. According to the level at which they operate, we classify the different approaches into three categories: object placement, routing protocol, and underlay. The first two approaches tackle information lookup and retrieval in the overlay network, while the last one addresses traffic imbalance at the level of the underlying network. Index Terms—Load balancing, peer-to-peer, distributed hash tables, decentralized systems. I. I NTRODUCTION P EER-TO-PEER (P2P) systems are a class of decentralized distributed systems in which each participating node acts as both a client and a server for the other participating peers. Information storage, lookup and retrievals generate load that is shared among all peers. This is obviously an advantage from the point of view of reliability, robustness and scalability. P2P systems do, however, require load balancing algorithms able to fairly distribute the load among all participating peers, in order to avoid situations in which some peers or links would experience much heavier load than others. This survey focuses on IP-based P2P systems, wherein a peer may communicate directly with every other peer. Hence, we do not consider mobile ad-hoc networks. For a larger coverage, we do not make any assumptions on the network management facilities provided by the underlying infrastructure. A. Definitions Before discussing load balancing mechanisms, we have to define the meaning of “load” and associated terms. In the context of P2P systems, the load may relate to objects, peers, or links. We denote by object a piece of information stored in the system, and its popularity is the frequency at which it is accessed. The object load can therefore be induced by its size and popularity. Each node (i.e., peer) has a limited capacity in terms of available storage space, processing time, or bandwidth [1]. The request load on a node is caused by the queries received for objects stored locally. It covers all aspects related to communication costs (i.e., Manuscript received July 29, 2012; revised January 11, 2013. The authors are with the University of Neuchˆ atel, Switzerland (e-mail: eryk.schiller@unine.ch). Digital Object Identifier 10.1109/SURV.2013.060313.00157 Load Balancing Object Placement Routing Namespace Virtual servers Multiple hashes Caching & Replication Link reorganization Path redundancy Underlay Topology-based IDs Proximity neighbor selection Proximity routing Fig. 1. Taxonomy of surveyed load balancing solutions. sent and received messages) and computational power spent for request processing. As peers may forward messages to other peers during information lookup, they are also exposed to a given routing load for queries that only traverse them. The combination of both types of communication activities is referred to as traffic load [2] in the overlay network. B. Roadmap This survey covers the recent work on load balancing in P2P systems based on DHTs. We classify the different approaches as depicted in Figure 1. Section II provides a short introduction to DHTs. In Section III, we introduce and discuss various causes of load imbalance. Section IV presents approaches that exploit object placement. Section V focuses on balancing traffic. Section VI discusses the use of information on the underlying network structure to optimize overlay communications. Finally, Sections VII and VIII end the survey with a short discussion and concluding remarks. C. Related Surveys Load balancing in distributed hash tables (DHTs) shares common challenges, and in some respect solutions, with other domains such as network load balancing, in which multiple interfaces are used for simultaneous data transmissions [3], or even multiprocessor scheduling problems [4] that must assign tasks to processors to obtain the lowest completion time. Yet, because of their decentralized structure, DHT overlays require dedicated load balancing algorithms that take into account 1553-877X/14/$31.00 c 2014 IEEE