Peer-to-Peer Netw. Appl. (2011) 4:192–209
DOI 10.1007/s12083-010-0075-1
Multi-objective optimization based privacy preserving
distributed data mining in Peer-to-Peer networks
Kamalika Das · Kanishka Bhaduri · Hillol Kargupta
Received: 31 December 2009 / Accepted: 3 June 2010 / Published online: 22 June 2010
© Springer Science+Business Media, LLC 2010
Abstract This paper proposes a scalable, local privacy-
preserving algorithm for distributed Peer-to-Peer (P2P)
data aggregation useful for many advanced data min-
ing/analysis tasks such as average/sum computation,
decision tree induction, feature selection, and more.
Unlike most multi-party privacy-preserving data min-
ing algorithms, this approach works in an asynchro-
nous manner through local interactions and it is highly
scalable. It particularly deals with the distributed com-
putation of the sum of a set of numbers stored at
different peers in a P2P network in the context of a P2P
web mining application. The proposed optimization-
based privacy-preserving technique for computing the
sum allows different peers to specify different privacy
requirements without having to adhere to a global set
of parameters for the chosen privacy model. Since dis-
A shorter version of this paper was published in IEEE
P2P’09 conference. This work was supported by AFOSR
MURI grant 2009-11.
K. Das (B )
Stinger Ghaffarian Technologies Inc., NASA Ames
Research Center, MS 269-3, Moffett Field, CA 94035, USA
e-mail: Kamalika.Das@nasa.gov
K. Bhaduri
Mission Critical Technologies Inc., NASA Ames Research
Center, MS 269-2, Moffett Field, CA 94035, USA
e-mail: Kanishka.Bhaduri-1@nasa.gov
H. Kargupta
CSEE Dept., University of Maryland, Baltimore County,
MD 21250, USA
e-mail: hillol@cs.umbc.edu
H. Kargupta
AGNIK LLC, Columbia, MD 21045, USA
tributed sum computation is a frequently used primi-
tive, the proposed approach is likely to have significant
impact on many data mining tasks such as multi-party
privacy-preserving clustering, frequent itemset mining,
and statistical aggregate computation.
Keywords Privacy preserving · Data mining ·
Peer-to-Peer
1 Introduction
Privacy-preserving data mining (PPDM) is a require-
ment in increasing number of multi-party applica-
tions where the data is distributed among many
nodes in a network. Web mining applications in
Peer-to-Peer (P2P) networks [6, 17] and cross-domain
network threat management systems for analyzing
cyber-terrorism trends
1
are some examples where data
privacy is an important issue. In such large distributed
environments, PPDM algorithms are unlikely to work
unless they can offer scalability and heterogeneous
privacy-models. Scalability can be addressed by local
algorithms in which the communication overhead is
bounded by a constant or slowly growing polynomial
[6, 25]. In a multi-party environment such as the In-
ternet, different users may have different requirements
of privacy. Hence a heterogenous privacy model in
such scenarios gives parties the autonomy to optimize
their privacy cost requirements. This paper takes a
step toward developing such a model for privacy pre-
serving data aggregation in a P2P network. The main
1
http://www.agnik.com/PursuitFlyer.pdf