HiMod-Pert: Histogram Modification Based Perturbation
Approach for Privacy Preserving Data Mining
Alpa Kavin Shah
1(
✉
)
and Ravi Gulati
2
1
MCA Department, Sarvajanik College of Engineering and Technology, Surat, India
alpa.shah@scet.ac.in
2
Department of Computer Science, Veer Narmad South Gujarat University, Surat, India
rmgulati@vnsgu.ac.in
Abstract. Privacy Preserving Data Mining (PPDM) protects the disclosure of
sensitive quasi-identifiers of dataset during mining by perturbing the data. This
perturbed dataset is then used by trusted Third Party for effective derivation of
association rules. Many PPDM algorithms destroy the original data to generate
the mining results. It is essential that the perturbed data preserves the statistical
inference of the sensitive attributes and minimize the information loss. Existing
techniques based on Additive, Multiplicative and Geometric Transformations
have minimal information loss, but suffer from reconstruction vulnerabilities. We
propose Histogram Modification based method, viz. HiMod-Pert, for preserving
the sensitive numeric attributes of perturbed dataset. Our method uses the differ‐
ence in neighboring values to determine the perturbation factor. Experiments are
performed to implement and test the applicability of the proposed technique.
Evaluation using descriptive statistic metrics shows that the information loss is
minimal.
Keywords: Privacy preserving data mining · Histogram Modification
Additive white Gaussian noise · Multiplicative perturbation
Geometric Data Perturbation
1 Introduction
Since last couple of decades, information collection over Internet is witnessing an expo‐
nential growth. More users have started providing their personal information in different
Internet based activities like purchases/sales, auctions, entertainment, gaming, online
surveys, to name a few. A person can now be easily and accurately linked based on his/
her Internet activities, leading to a serious pose of privacy intrusion to the individuals.
This vast pool of data has necessitated the need for efficient data mining protocols. Data
mining which was limited and confined to narrower domain of Enterprises and Appli‐
cations now encompasses Big Data and Cloud Computing.
Data collection has increased many-folds for research, trend analysis and more often
collaborative mining results. It is vital that the information provided by the users should
not breach their privacy. This concern has caught attention of researchers and is widely
studied for improvements even today. PPDM algorithms tackle this issue by optimizing
© ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2018
Z. Patel and S. Gupta (Eds.): ICFITT 2017, LNICST 220, pp. 28–36, 2018.
https://doi.org/10.1007/978-3-319-73712-6_3