International Journal of Computer Applications (0975 8887) Volume 49No.20, July 2012 9 Privacy Preservation using Association Rule Mining with Limited Side Effect Deepti Gatne Department of Computer Engineering KKWIEER,Nasik Prof. S. S. Sane Department of Computer Engineering KKWIEER,Nasik Prof. Manoj Jhade Department of Computer Engineering KKWIEER,Nasik ABSTRACT The privacy preservation using association rule mining is the base of this research. The concept of privacy preserving data mining has been proposed in response to the concerns of preserving personal information from data mining algorithms. The proposed method focuses on minimizing side effects caused by privacy preservation techniques. Side effects are loss of rules and generation of the false rules. One of the techniques in privacy preservation selectively modifies individual values from a database to prevent the discovery of a set of rules. There are two known algorithms for it, ISL (Increase Support of Left) and DSR (Decrease Support of Right). Since ISL & DSR techniques aim at hiding all sensitive rules, they cannot avoid the undesired side effects. ISL algorithm results in false rules generation where DSR results in loss of rules. The propose system suggest modification to both of these algorithms in such a way that output is generated with limited side effects. Also it takes the decision about which algorithm to be used to hide a specific rule. General Terms Data mining: Data mining is the process of extracting patterns from data. Data mining is seen as an increasingly important tool by modern business to transform data into business intelligence giving an informational advantage. Association rules: Association rules are statements of the form {X1,X2,….Xn }–>Y, meaning that if we find all of X1;X2; : : :;Xn in the market basket, then we have a good chance of finding Y. Support of the rule: The support supp(X) of an item set X is defined as the proportion of transactions in the data set which contain the item set. Confidence of the association rule: Confidence is the ratio of the number of transactions that include all items in the consequent as well as the antecedent (namely, the support) to the number of transactions that include all items in the antecedent: Keywords Privacy preservation, limited side effects, ISL, DSR 1. INTRODUCTION Privacy preserving data mining is a novel research direction in data mining and statistical databases where data mining algorithms are analyzed for the side-effects which incur in process of privacy preservation. Privacy here means the logical security of data not the traditional security of data e.g. access control, theft, hacking etc. Aim is to publish data in such way that information remains practically useful but at the same time identity of an individual cannot be determined [4] 1.1 Motivation The basic objective of this project is providing privacy to database with limited side effects. ISL & DSR are the algorithm used for providing privacy preservation using association rule mining, both of these algorithms causes side effect like false rule & lost rule. False rule means the spurious rules those are falsely generated and lost rules means non sensitive strong rules get falsely hidden. Is there any way using which it is possible to minimize the side effect caused by ISL & DSR algorithms? 1.2Existing systems The common idea to modify the database for rule hiding is as follows: For a sensitive rule r: X Y, deleting item i ϵ X U Y from transactions that contain X U Y will decrease both Sup XUY and Conf r . Moreover, inserting item i ϵ X into transactions that contain X but {i} and do not contain Y will decrease Conf r . The first strategy, called ISL, decreases the confidence of a rule by increasing the support of the item sets in its LHS (left-hand-side). The second approach, called DSR, reduces the confidence of the rule by decreasing the support of the item sets in its RHS (right-hand-side) [2]. Both of these algorithms sequential start modifying (i.e. either deleting or inserting) all the records that contain rule XY and whose confidence value is greater than MCT. This results into side effects as lost rule & false rule generation. 1.3Concept or seed idea Concept is to minimize the side effect caused by privacy preservation process (i.e. ISL & DSR algorithms). For minimize the side effect there are two things that can be modified. i. Rather than modifying all the records why not to modify only selected number of record so that confidence of the rule will not fall down to 0. ii. Rather than modifying all records sequential we can modify selected records. Now decision has to be made properly while hiding each and every rule about how many and which records to be modified so, there will be limited side effects. 2. LITERATURE SURVEY 2.1 Existing Algorithms There are two different algorithms ISL & DSR. Here we are going to discuss DSR algorithm in detail. As we have already discussed it reduces the confidence of the rule by decreasing the support of the item sets in its RHS (right-hand-side). 2.1.1 DSR algorithm [3] Input: source database D, MCT & MST values, set of a rule that needs to be hidden. Output: A transformed database D, where rules containing X on Right Hand Side (RHS) will be hidden Algorithm: 1. Find all possible rules from given items X; 2. Compute confidence of all the rules. 3. For each rule containing h, compute confidence of rule U 4. For each rule U in which h is in RHS 4.1. If confidence (U) < min conf, then Go to next large 2- itemset; Else go to step 5