Adaptively Routing P2P Queries Using Association Analysis
Brian D. Connelly, Christopher W. Bowron, Li Xiao, Pang-Ning Tan, and Chen Wang
Department of Computer Science and Engineering
Michigan State University
East Lansing, MI 48824 US
{connel42,bowronch,lxiao,ptan,wang}@cse.msu.edu
Abstract
Unstructured peer-to-peer networks have become a very
popular method for content distribution in the past few
years. By not enforcing strict rules on the network’s
topology or content location, such networks can be created
quickly and easily. Unfortunately, because of the unstruc-
tured nature of these networks, in order to find content,
query messages are flooded to nodes in the network, which
results in a large amount of traffic. This work borrows
the technique of association analysis from the data mining
community and extends it to intelligently forward queries
through the network. Because only a small subset of a
node’s neighbors are forwarded queries, the number of
times those queries are propagated is also reduced, which
results in considerably less network traffic. These savings
enable the networks to scale to much larger sizes, which
allows for more content to be shared and more redundancy
to be added to the system, as well as allowing more users
to take advantage of such networks.
I. . Introduction
The popularity and number of peer-to-peer (P2P) net-
works has exploded in the past several years. They have
proved to be a viable method for the dissemination of
data across a network. Aside from the legal issues faced
by a few existing networks regarding the distribution of
copyrighted material, P2P networks also serve many useful
legitimate purposes, such as load balancing, providing
more flexible and up-to-date routing information [1], man-
aging voice traffic [2], and offering efficient downloads of
free software [3].
Many of the networks in use today follow the model of
unstructured peer-to-peer, which was first widely used in
the Gnutella [4] network. These networks do not impose
any rules as to how the nodes organize themselves or where
shared content is located. This has the benefit of allowing
nodes to join and leave the system without significantly
affecting the entire system.
One disadvantage of this approach, however, is that the
location of content shared on the network is not known. In
order for a user to find a particular piece of content, he or
she ”floods” the network with query messages. In flooding,
a query message is sent to all of a peer’s neighbors, which,
in turn, forward the query to all of their neighbors, and so
on. This behavior results in the query reaching all nodes, so
if any node shares content that matches the user’s query, it
will be found. Because flooding creates so many messages,
the amount of traffic on the network grows considerably
with each node that joins, because that node will propagate
all received queries to each of its neighbors, as well as
issue new queries, which generate many flooded query
messages. The end result of this large volume of traffic
is that current networks using unstructured P2P reach a
limit in the number of users who can concurrently use the
system.
This paper presents a new approach to limiting the
number of queries which are flooded in the network. This
approach uses the concept of association analysis, which
has been studied extensively in the data mining community.
By extending association analysis to include measures of
quality for rule sets and driving the rule generation process
by feedback, nodes intelligently forward query messages to
a subset of neighbors that are likely to continue forwarding
queries towards nodes that share the desired content.
Because this significantly reduces the number of query
messages that are flooded while maintaining the ability
to successfully locate content, the overall traffic on the
network is decreased, allowing more users to make use
Proceedings of the 2006 International Conference on Parallel Processing (ICPP'06)
0-7695-2636-5/06 $20.00 © 2006