Peer-to-Peer Netw. Appl. (2011) 4:192–209 DOI 10.1007/s12083-010-0075-1 Multi-objective optimization based privacy preserving distributed data mining in Peer-to-Peer networks Kamalika Das · Kanishka Bhaduri · Hillol Kargupta Received: 31 December 2009 / Accepted: 3 June 2010 / Published online: 22 June 2010 © Springer Science+Business Media, LLC 2010 Abstract This paper proposes a scalable, local privacy- preserving algorithm for distributed Peer-to-Peer (P2P) data aggregation useful for many advanced data min- ing/analysis tasks such as average/sum computation, decision tree induction, feature selection, and more. Unlike most multi-party privacy-preserving data min- ing algorithms, this approach works in an asynchro- nous manner through local interactions and it is highly scalable. It particularly deals with the distributed com- putation of the sum of a set of numbers stored at different peers in a P2P network in the context of a P2P web mining application. The proposed optimization- based privacy-preserving technique for computing the sum allows different peers to specify different privacy requirements without having to adhere to a global set of parameters for the chosen privacy model. Since dis- A shorter version of this paper was published in IEEE P2P’09 conference. This work was supported by AFOSR MURI grant 2009-11. K. Das (B ) Stinger Ghaffarian Technologies Inc., NASA Ames Research Center, MS 269-3, Moffett Field, CA 94035, USA e-mail: Kamalika.Das@nasa.gov K. Bhaduri Mission Critical Technologies Inc., NASA Ames Research Center, MS 269-2, Moffett Field, CA 94035, USA e-mail: Kanishka.Bhaduri-1@nasa.gov H. Kargupta CSEE Dept., University of Maryland, Baltimore County, MD 21250, USA e-mail: hillol@cs.umbc.edu H. Kargupta AGNIK LLC, Columbia, MD 21045, USA tributed sum computation is a frequently used primi- tive, the proposed approach is likely to have significant impact on many data mining tasks such as multi-party privacy-preserving clustering, frequent itemset mining, and statistical aggregate computation. Keywords Privacy preserving · Data mining · Peer-to-Peer 1 Introduction Privacy-preserving data mining (PPDM) is a require- ment in increasing number of multi-party applica- tions where the data is distributed among many nodes in a network. Web mining applications in Peer-to-Peer (P2P) networks [6, 17] and cross-domain network threat management systems for analyzing cyber-terrorism trends 1 are some examples where data privacy is an important issue. In such large distributed environments, PPDM algorithms are unlikely to work unless they can offer scalability and heterogeneous privacy-models. Scalability can be addressed by local algorithms in which the communication overhead is bounded by a constant or slowly growing polynomial [6, 25]. In a multi-party environment such as the In- ternet, different users may have different requirements of privacy. Hence a heterogenous privacy model in such scenarios gives parties the autonomy to optimize their privacy cost requirements. This paper takes a step toward developing such a model for privacy pre- serving data aggregation in a P2P network. The main 1 http://www.agnik.com/PursuitFlyer.pdf