American Journal of Networks and Communications 2015; 4(3-1): 45-53 Published online February 10, 2015 (http://www.sciencepublishinggroup.com/j/ajnc) doi: 10.11648/j.ajnc.s.2015040301.18 ISSN: 2326-893X (Print); ISSN: 2326-8964 (Online) Privacy preserving data publishing through slicing Shivani Rohilla, Megha Sharma, A. Kulothungan, Manish Bhardwaj Department of Computer science and Engineering, SRM University, NCR Campus, Modinagar, Ghaziabad, India Email address: shivani.engineer@gmail.com (S. Rohilla), megha.tech09@gmail.com (M. Sharma), kulosoft@gmail.com (A. Kulothungan), aapkaapna13@gmail.com (M. Bhardwaj) To cite this article: Shivani Rohilla, Megha Sharma, A. Kulothungan, Manish Bhardwaj. Privacy Preserving Data Publishing through Slicing. American Journal of Networks and Communications. Special Issue: Ad Hoc Networks. Vol. 4, No. 3-1, 2015, pp. 45-53. doi: 10.11648/j.ajnc.s.2015040301.18 Abstract: Microdata publishing should be privacy preserved as it may contain some sensitive information about an individual. Various anonymization techniques, generalization and bucketization, have been designed for privacy preserving microdata publishing. Generalization does not work better for high dimensional data. Bucketization failed to prevent membership disclosure and does not show a clear separation between quasi-identifiers and sensitive attributes. There are number of attributes in each record which can be categorized as 1) Identifiers such as Name or Social Security Number are the attributes that can be uniquely identify the individuals. 2)Some attributes may be Sensitive Attributes(SAs) such as disease and salary and 3) Some may be Quasi Identifiers (QI) such as zipcode, age, and sex whose values, when taken together, can potentially identify an individual. Data anonymization enables the transfer of information across a boundary, such as between two departments within an agency or between two agencies, while reducing the risk of unintended disclosure, and in certain environments in a manner that enables evaluation and analytics post-anonymization. Here, we present a novel technique called slicing which partitions the data both horizontally and vertically. It preserves better data utility than generalization and is more effective than bucketization in terms of sensitive attribute. Keywords: PPDP, AG, CG, PT 1. Introduction Data Anonymization is a technology that convert clear text into a non-human readable form. Data anonymization technique for privacy-preserving data publishing has received a lot of attention in recent years. Detailed data (also called as micro-data) contains information about a person, a household or an organization. Data mining is the process of analysing data from different perspectives and summarizing it into useful information. Knowledge discovery from databases, techniques like clustering, association rules, regression, classification, decision trees, genetic algorithm etc. are used nowadays. Data mining is also used in areas of Science and Engineering such as genetics and bioinformatics. In both generalization and bucketization, one first removes identifiers from the data and then partitions tuples into buckets. The two techniques differ in the next step. Generalization transforms the QI-values in each bucket into “less specific but semantically consistent” values so that tuples in the same bucket cannot be distinguished by their QI values. In bucketization, one separates the SAs from the QIs by randomly permuting the SA values in each bucket. The anonymized data consists of a set of buckets with permuted sensitive attribute values. Fig. 1.1. Anonymization of data In this information age, data and knowledge extracted by data mining techniques represent a key asset driving research, innovation, and policy-making activities. Many agencies and organizations have recognized the need of accelerating such trends and are therefore willing to release the data they collected to other parties, for purposes such as research and the formulation of public policies. However the data publication processes are today still very difficult. Data often contains personally identifiable information and therefore releasing such data may result in privacy breaches, this is the