C. Fyfe et al. (Eds.): IDEAL 2008, LNCS 5326, pp. 250 – 257, 2008.
© Springer-Verlag Berlin Heidelberg 2008
A Data Perturbation Method by Field Rotation and
Binning by Averages Strategy for Privacy Preservation
Mohammad Ali Kadampur and Somayajulu D.V.L.N.
Department of Computer Science and Engineering
National Institute of Technology Warangal-506004, A.P. India
ali.kadmpur@gmail, soma@nitw.ac.in
www.nitw.ac.in
Abstract. In this paper a novel technique useful to guarantee privacy of sensi-
tive data with specific focus on numeric databases is presented. It is noticed that
analysts and decision makers are interested in summary values of the data rather
than the actual values. The proposed method considers that the maximum in-
formation lies in association of attributes rather than their actual proper values.
Therefore it is aimed to perturb attribute associations in a controlled way, by
shifting the data values of specific columns by rotating fields. The number of
rotations is determined via using a support function for association rule han-
dling and an algorithm that computes the best-choice rotation dynamically. Fi-
nal summary statistics such as average, standard deviation of the numeric data
are preserved by making bin average replacements for the actual values. The
methods are tested on selected datasets and results are reported.
1 Introduction
Privacy is defined as “freedom from unauthorized intrusion” [15]. It is a deterrent
against individually identifiable data in the process of knowledge extraction. Data min-
ing technology is used for extracting knowledge from vast quantities of data. However
the use of this technology has raised the concern that individual privacy is violated.
Therefore the data mining technique must ensure that any information disclosed
1. cannot be traced to an individual; or
2. does not constitute an intrusion.
There are multiple approaches to achieve these goals[15]. Data perturbation is one
of the methods for preserving privacy[2][12][15]. In perturbed data bases, if unauthor-
ized data is accessed, the true value is not disclosed. Data perturbation techniques in
effect distort the data in different ways before presenting it to the data mining algo-
rithm, thus individually identifiable (private) values are not revealed. The privacy-
preserving properties of such databases are a result of the perturbation. In this paper a
composite novel method for data perturbation is proposed.
2 Related Work
In order to distort the data and preserve individual privacy, researchers have employed
methods such as data encryption[11][13], Data randomization[12][15], Data swapping