Parallel Processing Framework on a P2P System Using Map and Reduce Primitives
Kyungyong Lee
*
, Tae Woong Choi
†
, Arijit Ganguly
‡
, David I. Wolinsky
*
, P.Oscar Boykin
*
, Renato Figueiredo
*
*
ACIS Lab. Department of ECE. University of Florida
E-mail: {klee, davidiw, boykin, renato}@acis.ufl.edu
†
Samsung SDS, Seoul, South Korea
E-mail: taewoong.choi@samsung.com
‡
Amazon Web Service. Amazon.com
E-mail: arijit@amazon.com
Abstract—This paper presents a parallel processing frame-
work for structured Peer-To-Peer (P2P) networks. A parallel
processing task is expressed using Map and Reduce primitives
inspired by functional programming models. The Map and
Reduce tasks are distributed to a subset of nodes within a P2P
network for execution by using a self-organizing multicast tree.
The distribution latency cost of multicast method is O(log(N )),
where N is a number of target nodes for task processing.
Each node getting a task performs the Map task, and the task
result is summarized and aggregated in a distributed fashion
at each node of the multicast tree during the Reduce task. We
have implemented this framework on the Brunet P2P system,
and the system currently supports predefined Map and Reduce
tasks or tasks inserted through Remote Procedure Call (RPC)
invocations. A simulation result demonstrates the scalability
and efficiency of our parallel processing framework. An ex-
periment result on PlanetLab which performs a distributed
K-Means clustering to gather statistics of connection latencies
among P2P nodes shows the applicability of our system in
applications such as monitoring overlay networks.
Keywords-Parallel processing, Map, Reduce, P2P, Monitor-
ing, Distributed data mining
I. I NTRODUCTION
In recent years, Peer-To-Peer (P2P) systems have received
considerable attention from industry and academia. Different
from the client-server architecture, each peer in a P2P system
participates a virtual overlay network while acting as a client
and a server. Without a central server, in a P2P network, each
peer is responsible for providing and retrieving information
and services to and from other peers in the overlay network.
Despite of the growing popularity of P2P model, use cases
of P2P systems are limited to file-sharing applications (e.g.,
Gnutella and BitTorrent) and VoIP solutions (e.g., Skype).
Map and Reduce primitives are popular paradigms in
functional programming languages. LISP defines Map as a
function that applies to successive sets of input data. Reduce
is defined as a function that combines input elements of
sequence or aggregates results from those elements. Erlang
and Python use Map and Reduce functions similarly to
LISP. Hadoop [1] and Google-MapReduce [2] also use
Map and Reduce concepts for large data processing jobs.
Different from the others, Hadoop and Google MapReduce
apply Map and Reduce primitives at distributed computing
environments freeing users from parallel job distributions
and handling failed nodes.
In this paper we present a decentralized parallel process-
ing framework which uses Map and Reduce primitives on
a structured P2P network for applications such as network
status monitoring, resource discovery, and distributed data
mining. Without a central broker node, our system relies
on a self-organizing multicast tree for an efficient task
distribution. Map functions are performed at each node in
a multicast tree with input data. In parallel to Map task
execution, the task is propagated to child nodes in a multicast
tree to reach all leaf nodes. Once leaf nodes have computed
their Map functions, the results are communicated up the
tree. As the results are propagated up the tree, aggregation
and summarization happen at each intermediate node. Those
nodes execute Reduce function over the results obtained
from their child nodes and local Map function.
We have implemented our parallel processing framework
on a structured P2P framework, Brunet. Brunet implements
Symphony [3], a 1-D Kleinberg small-world architecture [4],
and we use the Brunet P2P overlay for connection manage-
ment, routing, and task distribution. In order to provide an
interface to define Map and Reduce tasks, the system pro-
vides not only basic Map (e.g., count function) and Reduce
(e.g., add and array concatenation) tasks, but an XML-RPC
interface to register user-defined Map and Reduce functions.
The major contributions of this work are as following:
• A novel P2P parallel processing framework that uses
multicast trees and Map and Reduce primitives
• Implementation and real-world deployment of our sys-
tem show applicability and feasibility
The rest of this paper is organized as follows. Section II
introduces Map and Reduce primitives on functional pro-
gramming languages. Section III presents an architecture
of our parallel processing framework. Section IV talks
about use case examples of our system. Section V covers
discussions for our system. Section VI evaluates our system
through simulation and a PlanetLab experiment. Section VII
discusses related works. Section VIII concludes this paper.
2011 IEEE International Parallel & Distributed Processing Symposium
1530-2075/11 $26.00 © 2011 IEEE
DOI 10.1109/IPDPS.2011.315
1601
2011 IEEE International Parallel & Distributed Processing Symposium
1530-2075/11 $26.00 © 2011 IEEE
DOI 10.1109/IPDPS.2011.315
1597
2011 IEEE International Parallel & Distributed Processing Symposium
1530-2075/11 $26.00 © 2011 IEEE
DOI 10.1109/IPDPS.2011.315
1597
2011 IEEE International Parallel & Distributed Processing Symposium
1530-2075/11 $26.00 © 2011 IEEE
DOI 10.1109/IPDPS.2011.315
1597
2011 IEEE International Parallel & Distributed Processing Symposium
1530-2075/11 $26.00 © 2011 IEEE
DOI 10.1109/IPDPS.2011.315
1597
2011 IEEE International Parallel & Distributed Processing Symposium
1530-2075/11 $26.00 © 2011 IEEE
DOI 10.1109/IPDPS.2011.315
1602
2011 IEEE International Parallel & Distributed Processing Symposium
1530-2075/11 $26.00 © 2011 IEEE
DOI 10.1109/IPDPS.2011.315
1602