Jelly: A Dynamic Hierarchical P2P Overlay Network with Load Balance and Locality Richard Hsiao and Sheng-De Wang Department of Electrical Engineering National Taiwan University, Taipei 106, TAIWAN Abstract P2P systems based on Distributed hash table (DHT) such as CAN, Chord, Pastry, and Tapestry, use uniform hash functions to ensure load balance in each participant nodes. But the evenly distributed behavior in the virtual space destroys the locality between participant nodes. The topology-based hierarchical overlay network like Grapes, exploits the physical distance information among the nodes to construct a two-layered hierarchy, highly improves the locality, but damages the load balance property in original DHTs. In this paper, we propose a dynamic P2P overlay infrastructure, called Jelly. It can achieve both the load balancing and locality properties. Its design is based on the hierarchical overlay and uses the DHT as its routing algorithm. Because the load balancing issue in a hierarchical overlay is originated from whether the virtual hierarchy is balanced or not, Jelly uses a node joining mechanism as a fine-tuning tool and a dynamic checking mechanism as a coarse-tuning tool to balance the hierarchy. We also find that the average routing hops is a practical metric to evaluate the network size, and it is useful for Jelly’s dynamic mechanism. 1. Introduction In recent years, peer-to-peer (P2P) systems have been the burgeoning research topic in large distributed system. Gnutella [1] and Napster [2] are the most famous peer-to–peer file sharing systems among these, but both of them have the scalability problem. To address this problem, distributed hash tables (DHT) have became an fundamental part to build peer-to-peer overlay networks , CAN [3] , Chord [4] , Pastry [5] , Tapestry [6] are well-known works of these infrastructures. Many applications are layered above DHTs, such as file sharing systems [7] [8] [9], event notification services [10] [11], and application-layer multicast [12] [13] [14]. Although each of them has different location and routing algorithms, all of them have the same feature, using consistent hashing (like SHA-1) to let the participant nodes and objects distributed uniformly in its virtual space; in general condition, these systems can achieve fairly good load balancing property . But the primitive DHT schemes have a significant disadvantage that they may violate the locality property. During the locating and routing process, the messages choose the next hop to a host regardless of the physical topology information. This produces inefficient effects in response time and overall physical path length for lookup service. To address this problem, the DHTs should take into consideration of the relative physical position among the participant nodes. All of these systems have designed some similar approaches like [18], to exploit locality by measuring proximity metric like round trip time (RTT) or the IP level hops. This improvement assures the next hop selection is the relatively closer node on the underlying network that matches the routing condition, but the physical distance between the nodes looking for the object and the nodes storing that object could be still long. Grapes [15] provide the hierarchical virtual network infrastructure using physical topology information. It has two-layered overlay network, the upper layer called super-network, the lower layer called sub-network; in both layers, any DHTs routing algorithm can be used. Each sub-network has a leader joining the super-network routing and managing the sub-network. The physically nearby nodes construct the sub-network, and during each super-network query, the leader caches the object in its sub-network. Finally, a node can find the object in its sub-network with high probability, because the physical distance of any node pairs in sub-network is short, and thus this infrastructure can greatly reduce the lookup distance. Although hierarchical overlay network like Grapes can highly improve the locality property of DHTs, it does not have the load balancing property. If DHT can provide load balance, then each leader in super-network is assigned to nearly the same load. After the lower-layered mapping, the load of each node in the entire system will no longer balance; the larger of the sub-network’s size is, the lighter of the load will be assigned to its subnodes. Grapes does not provide any mechanism to adjust the size of sub-network, as a result of its node joining algorithm, producing some extremely large sub-network and a significant amount of sub-network with relatively few subnodes. To address both the load balancing and locality problems, we propose Jelly, a dynamic hierarchical overlay network. Our main goal is to construct and maintain the well-balanced two-layered overlay network (the distribution of each sub-network’s size within a given range), assure each participant node be assigned to similar load. Jelly’s node joining mechanism is similar to Grapes. The difference is a newly-joined node not only checks the physical distance between each leader on the path in the inserting process and itself is shorter than the threshold or not, but also considers the size of the sub-network that each leader manages. If the size is larger than the given threshold, it is not appropriate to add one more node to this sub-network, because this may deteriorate the unbalance of entire hierarchy. Therefore, only when the newly-joined node finds a Proceedings of the 24th International Conference on Distributed Computing Systems Workshops (ICDCSW’04) 0-7695-2087-1/04 $20.00 © 2004 IEEE