I nternational Journal of Application or I nnovation in Engineering & M anagement (I JAI E M ) Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com Volume 2, Issue 7, July 2013 ISSN 2319 - 4847 Volume 2, Issue 7, July 2013 Page 257 ABSTRACT Publishing data about individuals without revealing sensitive information about them is an important problem. Distributed data mining applications, such as those dealing with health care, finance, counter-terrorism and homeland defence, use sensitive data from distributed databases held by different parties. This comes into direct conflict with an individual’s need and right to privacy. It is thus of great importance to develop adequate security techniques for protecting privacy of individual values used for data mining. Here, we study how to maintain privacy in distributed mining of frequent itemsets. That is, we study how two (or more) parties can find frequent itemsets in a distributed database without revealing each party’s portion of the data to the other. In this paper, we consider privacy-preserving naive Bayes classifier for horizontally partitioned distributed data and propose a two-party protocol and a multi-party protocol to achieve it. By classification accuracy and k-anonymity constraints, the proposed data mining privacy by decomposition (DMPD) method uses a genetic algorithm to search for optimal feature set partitioning. Multiobjective optimization methods are used to examine the tradeoff between privacy and predictive performance. Keywords- Distributed database, privacy, data mining, classification, k-anonymity 1. INTRODUCTION Information sharing is a vital building block for today’s business world. Data mining techniques have been developed successfully to extract knowledge in order to support a variety of domains—marketing, weather fore- casting, medical diagnosis, and national security. But it is still a challenge to mine certain kinds of data without violating the data owners’ privacy. As data mining become more pervasive, privacy concerns are increasing[1]. Distributed data mining is a process to extract globally interesting associations, classifiers, clusters, and other patterns from distributed data [2], where data can be partitioned into many parts either vertically or horizontally [3]. Vertical partition of data means that information about the same set of entities are distributed on different sites. For example, banks collect financial transaction information while IRS collects tax information. Horizontal partition of data means that the same set of information about different entities are distributed on different sites. For example, different hospitals collect the same type of patient data. Distributed data mining can be classed into two categories [4]. The first is server-to-server where data are distributed across several servers. The second is client- to-server where data reside on each client while a server or a data miner performs mining tasks on the aggregate data from the clients. A typical example in distributed data mining where privacy can be of great importance is in the field of medical research. Consider the case where a number of different hospitals wish to jointly mine their patient data, for the purpose of medical research. Privacy policy and law do not allow these hospitals from even pooling their data or revealing it to each other due to the confidentiality of patient records. Although hospitals are allowed to release data as long as the identifiers, such as name, address, etc., are removed, it is not safe enough because the re-identification attack can link different public databases to relocate the original subjects [5]. Consequently, providing privacy protection may be critical to the success of data collection, and to the success of the entire task of distributed data mining. Privacy-preserving data mining was firstly realized by Agrawal and Srikant [6] and Lindell and Pinkas [7] independently in 2000. Since then, a number of privacy- preserving data mining algorithms and protocols have been proposed, such as those for association rule mining [8–12], clustering [13,14], naive Bayes classifiers [15–18], etc. So far, there have been two main approaches for privacy-preserving data mining as follows. One is the randomization approach. The typical example is Agrawal–Srikant algorithm [6], in which data are randomized through the value class membership (values of an attribute are discretized in to intervals and the interval in Distributed Data Mining Privacy by Decomposition (DDMPD) with Naive Bayes Classifier and Genetic Algorithm Lambodar Jena 1 , Narendra Kumar Kamila 2 Department of Computer science & Engineering, 1 Gandhi Engineering College, Bhubaneswar. 2 C.V.Raman college of Engineering, Bhubaneswar.