International Journal of Database Theory and Application Vol.9, No.7 (2016), pp.147-156 http://dx.doi.org/10.14257/ijdta.2016.9.7.13 ISSN: 2005-4270 IJDTA Copyright ⓒ 2016 SERSC An Efficient Method for Protecting High Utility Itemsets in Utility Mining Anshu Chaturvedi, D. N. Goswami and Rishi Soni* Madhav Institute of Technology and Science, Gwalior – 474005, India, Jiwaji University, Gwalior – 474011, India, Jiwaji University, Gwalior – 474011, India anshu_chaturvedi@yahoo.co.in, goswamidn@yahoo.com, rishisoni17@gmail.com *Corresponding author Abstract Privacy preserving data mining (PPDM) has become a popular research direction in data mining. Privacy preserving data mining is an approach to develop algorithms by which we can modify the utility values of original data using some techniques in order to protect sensitive information from unauthorized user. Protecting data against illegal access becomes a serious issue when this data is required to be shared onto the network due to some reasons. To hide the sensitive information, many approaches have been proposed. In this study, we are proposing an efficient method, for protecting high utility itemsets using distortion technique where the values for high utility items are altered to achieve the privacy. Algorithm is designed in such a way so as to handle privacy without disclosure of sensitive information. The algorithm can completely hide any given utility items by scanning data iteratively. The results when compared with existing one show significant reduction in execution time. Keywords: Privacy preserving, Utility mining, High utility itemsets, Sanitization process 1. Introduction Data mining Data mining enables us to discover previously unknown and potentially useful information from huge amount of database. The knowledge discovered from this data plays an important role in decision making in the areas such as business management, marketing analysis, medical analysis, criminal records and credit records etc. [1]. Association rule mining is one of the common most approach in the field of data mining which determines all itemsets with support values greater than the specified threshold. To derive the utility of an itemset, utility mining came into existence which claims to be better than association rule mining in certain terms. Privacy preserving aspect of this part of data mining has gained momentum in the field of research application recently because the data set contains highly sensitive information as well and no user would like that sensitive information should be leaked to outsiders. Therefore, this sensitive information which can be mined from a database should be taken out separately, because such sensitive information can lead to compromises in data privacy when it is shared on the network. However, a key problem faced is the need to maintain the confidentiality of the disclosed data without hindering the legitimate needs of the data user. In doing so, it becomes necessary to modify the data value(s) and relationships (utility itemsets). Obtaining a true balance between the disclosure and hiding is a tricky issue [2-3]. This can be achieved largely by hiding the high utility itemsets that expose the sensitive part of the data. One such method is hiding high utility itemsets because association amongst the data is what is captured by most of the data users. Such vulnerability of high utility itemsets poses a great threat to the data if the data is in the