Adaptively Routing P2P Queries Using Association Analysis Brian D. Connelly, Christopher W. Bowron, Li Xiao, Pang-Ning Tan, and Chen Wang Department of Computer Science and Engineering Michigan State University East Lansing, MI 48824 US {connel42,bowronch,lxiao,ptan,wang}@cse.msu.edu Abstract Unstructured peer-to-peer networks have become a very popular method for content distribution in the past few years. By not enforcing strict rules on the network’s topology or content location, such networks can be created quickly and easily. Unfortunately, because of the unstruc- tured nature of these networks, in order to find content, query messages are flooded to nodes in the network, which results in a large amount of traffic. This work borrows the technique of association analysis from the data mining community and extends it to intelligently forward queries through the network. Because only a small subset of a node’s neighbors are forwarded queries, the number of times those queries are propagated is also reduced, which results in considerably less network traffic. These savings enable the networks to scale to much larger sizes, which allows for more content to be shared and more redundancy to be added to the system, as well as allowing more users to take advantage of such networks. I. . Introduction The popularity and number of peer-to-peer (P2P) net- works has exploded in the past several years. They have proved to be a viable method for the dissemination of data across a network. Aside from the legal issues faced by a few existing networks regarding the distribution of copyrighted material, P2P networks also serve many useful legitimate purposes, such as load balancing, providing more flexible and up-to-date routing information [1], man- aging voice traffic [2], and offering efficient downloads of free software [3]. Many of the networks in use today follow the model of unstructured peer-to-peer, which was first widely used in the Gnutella [4] network. These networks do not impose any rules as to how the nodes organize themselves or where shared content is located. This has the benefit of allowing nodes to join and leave the system without significantly affecting the entire system. One disadvantage of this approach, however, is that the location of content shared on the network is not known. In order for a user to find a particular piece of content, he or she ”floods” the network with query messages. In flooding, a query message is sent to all of a peer’s neighbors, which, in turn, forward the query to all of their neighbors, and so on. This behavior results in the query reaching all nodes, so if any node shares content that matches the user’s query, it will be found. Because flooding creates so many messages, the amount of traffic on the network grows considerably with each node that joins, because that node will propagate all received queries to each of its neighbors, as well as issue new queries, which generate many flooded query messages. The end result of this large volume of traffic is that current networks using unstructured P2P reach a limit in the number of users who can concurrently use the system. This paper presents a new approach to limiting the number of queries which are flooded in the network. This approach uses the concept of association analysis, which has been studied extensively in the data mining community. By extending association analysis to include measures of quality for rule sets and driving the rule generation process by feedback, nodes intelligently forward query messages to a subset of neighbors that are likely to continue forwarding queries towards nodes that share the desired content. Because this significantly reduces the number of query messages that are flooded while maintaining the ability to successfully locate content, the overall traffic on the network is decreased, allowing more users to make use Proceedings of the 2006 International Conference on Parallel Processing (ICPP'06) 0-7695-2636-5/06 $20.00 © 2006