International Journal of Computer Applications (0975 – 8887) Volume 140 – No.8, April 2016 48 Privacy Preserving Techniques on Centralized, Distributed and Social Network Data – A Review R. Padmaja Research Scholar SCSE, VIT University, Vellore V. Santhi, PhD Associate Professor SCSE, VIT University, Vellore. ABSTRACT Privacy Preserving Data Publishing refers publishing data in such a way that the privacy of the individuals are preserved. The Published data can further be used for various Data Analysis and Data Mining tasks. Techniques used to preserve privacy of individuals before publishing is called Anonymization Techniques. Initially only centralized data need to be published for analysis and Mining. Later with the advent of Internet, it has become necessary to publish Distributed and Social network data. The Anonymization Techniques that are applied on Centralized data can be applied on both Distributed and Social Network data with little modifications. This Paper is to present a brief review of Anonymization Techniques like k- anonymity and l-diversity on Centralized, Distributed and Social Network Data. Keywords SMC,TTP 1. INTRODUCTION Generally Organizations data contains personal information of individuals, so before releasing the data, the privacy should be preserved. Techniques that are used for privacy preserving data publishing are called Anonymization Techniques. Figure 1: Privacy Preserving Data Publishing Anonymization can be applied on centralized data, distributed and social network data. The popular anonymization techniques that can be applied on centralized data are k- anonymity and l-diversity. The same with little modifications can be applied on distributed and social network data. Figure 2: Anonymization Techniques can be applied on different databases 1.1 Relational Data Organizations often need to publish their data for Research or Mining. Generally such data is stored in a table and each record corresponds to one individual. The attributes of such table are divided into 3 categories. Explicit Attributes, attributes used to identify the tuple. SSN is an Example of Explicit attribute. The second category is Quasi Identifiers, whose values collectively used to identify the individual and finally third category is Sensitive attributes, whose values are considered sensitive [1]. For Example, medical organizations need to publish their patient data for Medical Research purpose. Since Patient Data contains sensitive information, it should not be published as it is i.e the privacy of the Patients should be preserved before publishing. Two types of information disclosure are possible [2,3]. Firstly Identitity disclosure and later Attribute disclosure. Identity disclosure, occurs when an individual is linked to a particular record in the published table. Attribute Disclosure, occurs when the published data helps to infer the characteristic of an individual more accurately. Identity disclosure often leads to attribute disclosure. Anonymization Techniques helps to limit such disclosures. First step of anonymization is removing Explicit identifiers but that is not enough because an adversary can identify an individual from the quasi identifiers. A common anonymization approach is Generalization, which replaces quasi identifier values into less specific but semantically consistent. As a result, more records with same set of quasi identifier values are retrieved. Identify a set of records whose quasi identifier values are same and make it an equivalence class. To effectively limit the disclosure, it is necessary to measure the disclosure risk of the anonymized table. Samarati et. al. introduced anonymization technique called k-Anonmity [4,5,6], which only prevents Identity disclosure, but it is not sufficient to prevent Attribute Disclosure, Machanavajjhala et al. [7] introduced a new notion of privacy, called l-diversity. ANONYMIZATION TECHNIQUES CENTRALIZED DATA DISTRIBUTED DATA SOCIAL NETWORK DATA Original Dataset Privacy Preserved data Data Publisher After Applying Anonymization Technique Data Receipent