Journal of Scientific & Industrial Research Vol. 75, July 2016, pp. 399-403 Hybrid Evolutionary Algorithm for Preserving Privacy of Sensitive Data in Quantitative Databases K Sathiyapriya* and G S Sadasivam Department of Computer Science and Engineering, PSG College of Technology, Coimbatore, India Received 13 July 2015; revised 29 March 2016; accepted 4 May 2016 Association rule mining technique has been widely used in various applications. However, the abuse of this technique may lead to the discovery of sensitive information. Researchers in recent times have made effort for hiding sensitive association rules. But most of the techniques proposed are generally applied in binary dataset. It suffers from side effects of lost and ghost rule. Most business, medical and scientific domains has quantitative value for its attributes. Limited research is available for hiding sensitive information in quantitative data. The aim of privacy preserving quantitative association rule mining is to i. Prevent the discovery of sensitive information. ii. Not to compromise the access and the use of non sensitive data. iii. Be utilizable on large amounts of data iv. Not to have an exponential computational complexity. In this paper, a hybrid evolutionary algorithm is proposed for effectively hiding the sensitive quantitative association rules and for improving the utility of the database. The performance of the proposed system is compared with existing algorithm by measuring number of lost rules, number of ghost rules and number of modifications to the original data. Keywords: Association rule, Sensitive data, Privacy Preservation, Genetic Algorithm, PSO. Introduction Association rule mining is one of the data mining technologies for discovering hidden useful information from large database. Association rule is defined as an implication of the form X→Y where X, Y ⊆ I where I is the set of items or attributes in the dataset and X∩Y = Ø. The sets of items X and Y are called antecedent (Left Hand Side (LHS)) and consequent (Right Hand Side (RHS)) of the rule respectively. Support and Confidence are the measures used to find interestingness of the rule. The confidence is calculated as |X U Y | / |X|, where |X| is the number of transactions containing X and |X U Y | is the number of transactions that contains both X and Y. The support of the rule is the percentage of transactions that contain both X and Y, which is calculated as |XUY |/N, where N is the number of transactions in D, the dataset. Many approaches have been proposed to find interesting association rules from quantitative values 2,3,4 . Data mining necessitates sharing of data which is proven to be beneficial for business partnerships in many applications like business planning or marketing. This data has to be protected against undesired modification, destruction of data and unauthorized reading of data. With increasing knowledge discovery capabilities, protection against undesired or unauthorized extraction of knowledge from data 5 becomes important. The challenge is to protect strategic decisions, but at the same time not to drop the benefit of association rule mining. An association rule is characterized as sensitive if its confidence is above disclosure threshold. These sensitive rules should be made uninteresting before releasing the dataset to the public. As the interestingness of association rule is measured in terms of confidence and support, the sensitive rule can be hidden by reducing the support and confidence below the specified threshold. The motivation for this research is to design sensitive rule hiding method for quantitative data by perturbing the data such that the released data continue to be effective, without compromising security. It is possible to perturb the data in different ways resulting in numerous solutions in the solution space. Each solution provides different degree of compromise between the knowledge mined and the security of private information. We need to choose an optimal solution which maximizes the knowledge mined without compromising security. In Particle Swarm Optimization (PSO), it is possible for the particles to exhaustively explore search space for finding better —————— ∗ Author for correspondence E-mail: sathya_jambai@yahoo.com