Adaptively Routing P2P Queries Using Association Analysis Brian D. Connelly, Christopher W. Bowron, Li Xiao, Pang-Ning Tan, and Chen Wang Department of Computer Science and Engineering Michigan State University East Lansing, MI 48824 US {connel42,bowronch,lxiao,ptan,wang}@cse.msu.edu Abstract Unstructured peer-to-peer networks have become a very popular method for content distribution in the past few years. By not enforcing strict rules on the network’s topology or content location, such networks can be created quickly and easily. Unfortunately, because of the unstruc- tured nature of these networks, in order to ﬁnd content, query messages are ﬂooded to nodes in the network, which results in a large amount of trafﬁc. This work borrows the technique of association analysis from the data mining community and extends it to intelligently forward queries through the network. Because only a small subset of a node’s neighbors are forwarded queries, the number of times those queries are propagated is also reduced, which results in considerably less network trafﬁc. These savings enable the networks to scale to much larger sizes, which allows for more content to be shared and more redundancy to be added to the system, as well as allowing more users to take advantage of such networks. I. . Introduction The popularity and number of peer-to-peer (P2P) net- works has exploded in the past several years. They have proved to be a viable method for the dissemination of data across a network. Aside from the legal issues faced by a few existing networks regarding the distribution of copyrighted material, P2P networks also serve many useful legitimate purposes, such as load balancing, providing more ﬂexible and up-to-date routing information [1], man- aging voice trafﬁc [2], and offering efﬁcient downloads of free software [3]. Many of the networks in use today follow the model of unstructured peer-to-peer, which was ﬁrst widely used in the Gnutella [4] network. These networks do not impose any rules as to how the nodes organize themselves or where shared content is located. This has the beneﬁt of allowing nodes to join and leave the system without signiﬁcantly affecting the entire system. One disadvantage of this approach, however, is that the location of content shared on the network is not known. In order for a user to ﬁnd a particular piece of content, he or she ”ﬂoods” the network with query messages. In ﬂooding, a query message is sent to all of a peer’s neighbors, which, in turn, forward the query to all of their neighbors, and so on. This behavior results in the query reaching all nodes, so if any node shares content that matches the user’s query, it will be found. Because ﬂooding creates so many messages, the amount of trafﬁc on the network grows considerably with each node that joins, because that node will propagate all received queries to each of its neighbors, as well as issue new queries, which generate many ﬂooded query messages. The end result of this large volume of trafﬁc is that current networks using unstructured P2P reach a limit in the number of users who can concurrently use the system. This paper presents a new approach to limiting the number of queries which are ﬂooded in the network. This approach uses the concept of association analysis, which has been studied extensively in the data mining community. By extending association analysis to include measures of quality for rule sets and driving the rule generation process by feedback, nodes intelligently forward query messages to a subset of neighbors that are likely to continue forwarding queries towards nodes that share the desired content. Because this signiﬁcantly reduces the number of query messages that are ﬂooded while maintaining the ability to successfully locate content, the overall trafﬁc on the network is decreased, allowing more users to make use Proceedings of the 2006 International Conference on Parallel Processing (ICPP'06) 0-7695-2636-5/06 $20.00 © 2006