Community Formation and Search in P2P: A Robust and Self-Adjusting Algorithm Tathagata Das, Subrata Nandi, and Niloy Ganguly Department of Computer Science and Engineering, Indian Institute of Technology, Kharagpur, India-721302 Email: {snandi,niloy}@cse.iitkgp.ernet.in Abstract—The existing literature deals with the problems dealing with decentralized content-based P2P community forma- tion and community-based search separately. Contrary to this approach, in this paper we propose a novel search algorithm that has both the capability to form the community structure as well as search it with maximum efﬁciency. The key contribution is to design a self-organized and adaptive search algorithm where as the community topology evolves with time, the search process adapts automatically to the situation to provide best performance. It performs an automatic transition from the exploratory phase to search phase, by estimating the global state of communities using a local control parameter. Moreover, we show that the strategy is also robust enough to improve search performance even under node churn, though a graceful degradation in overall performance is seen. We consider realistic power-law distribution for node degrees and information proﬁles. The proposed search strategy shows more than twice efﬁciency than a pure random walk with proliferation on the same network. Index Terms—unstructured search, peer-to-peer, networks, community, power-law. I. I NTRODUCTION Measurement studies on content-sharing peer-to-peer (P2P) networks show that contents belonging to certain categories are signiﬁcantly more popular than the rest and the generated queries are more biased towards searching the popular contents [1], [2]. Hence search performance can signiﬁcantly increase if the P2P overlay is organized in the form of loosely structured content-based communities 1 . It is analogous to human commu- nities found in social networks, where the nodes of a certain community will have dense connections within but between which connections are sparse. As a result, search algorithms taking advantage of the community-based topology can narrow down the search space by directing the query towards the required communities with the following objective: ask only those who probably know it [3]. In this paper, we address issues related to self-formation and management of content-based P2P communities along with efﬁcient community-aware query propagation on it. Due to the dynamic nature of large scale P2P networks, the algorithms need to be decentralized, self-adjusting and robust against rapidly changing system environments i.e. churn (nodes com- ing and leaving rapidly), the design of which is challenging. The basic idea of decentralized community formation and search has been explored in [4], [3], [5], [6], [7], [8]. Contrary 1 a subset of peers having contents with high degree of similarity in a certain attribute (like content) to the existing works, which deals with problems of commu- nity formation and search separately, in this paper we propose a novel search algorithm that has both, the capability to form the community structure as well as search it with maximum efﬁciency. The proposed search algorithm has two phases - Exploratory Phase and Search Phase. In the exploratory phase, the search concentrates more on exploring the network in order to ﬁnd the nodes having similar contents and probabilistically establishes community links between them, resulting in the formation of a community structure. In the search phase, the search focuses its effort into searching the similar communities that have already been formed. The key contribution of this paper is to design a self-organized and adaptive search algorithm where it performs an automatic transition from one phase to another which is tuned by a time-varying local control parameter. Moreover, we show that the strategy is also robust enough to improve search performance even under node churn, though a graceful degradation in overall performance is seen. As P2P overlay topologies have often been modeled in the literature as power law graphs [9],[5], we consider the initial P2P overlay to follow a power-law degree distribution. The contents stored in the nodes are divided into abstract categories called information proﬁles which are assumed to follow Zip’s law [2], [5]. Due to the highly skewed node degree and proﬁle distribution, designing rewiring rules to quickly obtain a connected well formed community topology is difﬁcult. Hence, in this paper, the community overlay is represented as a separate network grown over the initial power topology, while considering that the maximum number of neighbors each node can connect to is constrained by its original degree. In this paper, ﬁrst we developed a thorough understanding of the dynamics of the basic search strategies considering dif- ferent variations of random and greedy (search biased towards community links) search. Based upon the understanding, we ﬁnally designed the self-adjusting algorithm which consists of a healthy mix of random and greedy walking. We show that it performs far better than conventional random walk schemes. We further modify the algorithm to propose a ﬁnal generalized scheme so that it can handle both exact and approximate match schemes. The rest of the paper is organized as following: Section II brieﬂy discusses about the related work. The network model and the basic search algorithms are presented in section III