IJSRSET19629 | Received : 03 March 2019 | Accepted : 13 March 2019 | March-April -2019 [ 6 (2) : 84-90 ]
© 2019 IJSRSET | Volume 6 | Issue 2 | Print ISSN: 2395-1990 | Online ISSN : 2394-4099
Themed Section : Engineering and Technology
DOI : https://doi.org/10.32628/IJSRSET19629
84
A Study on Models and Techniques of Anonymization in Data
Publishing
Shipra Sharma, Naveen Choudhary, Kalpana Jain
Department of CSE, College of Technology and Engineering, Udaipur, Rajasthan, India
ABSTRACT
In the era where world runs online the storing and publishing of data online has also increased to a great extent.
In this era a large amount of information is collected and published to a network which is publically available.
With the exposure of data comes the risk of information leakage of an individual while publishing the data
online. Hence for the same we need a security system for preserving the privacy of individual and here the
concept of preserving privacy in data publishing came into existence. To achieve this privacy different privacy
models and techniques have been proposed which gives different levels of resistance against different attacks by
adversaries. In this paper we will discuss about these models and techniques and have a comparative study
among them.
Keywords : Privacy Models, Anonymization Techniques, Data Publishing, Privacy Preservation.
I. INTRODUCTION
The publishing of data involves providing the data for
public use for further research, study or surveys. But
when the data is published the identity of individuals
must be preserved to maintain the privacy. This
procedure of maintaining the privacy results in loss of
information of data and decreases its utility. So the
major challenge in this field is to preserve the privacy
with minimum data loss.
During the publishing of data we modify the data in
such a way that it does not lead to identity leak of an
individual and make it anonymous is a process called
anonymization. But before anonymization of data we
need to understand different type of data which exists.
1. Identifier: The fields or values which uniquely
identify an individual are called Identifier. For
example name, social security number.
2. Quasi Identifier: The values which do not directly
identify an individual but when linked with external
data set it can lead to identity disclosure as shown in
fig. 1.
Fig. 1: Quasi identifier linkage example.
3. Sensitive Attribute: The values which a person
doesn’t want to disclose or share. For example disease
or salary.
4. Non Sensitive Attribute: The details even if leaked
won’t harm the individual are non sensitive attribute.
Hence in anonymization we remove the identifier
field from the data set so that no direct identification
of individual can be possible. Then we modify the
quasi identifier to prevent from linkage attack before
publishing the data. Table 1 shows an example of