339
Statistical Disclosure Control Methods for Microdata
Oleg Chertov
1+
and Anastasiya Pilipyuk
1
1
National Technical University “Kyiv Polytechnic Institute”, Kyiv, Ukraine
Abstract. In this paper we formulate three basic tasks of statistical disclosure control for microdata,
analyze existent methods for achieving optimal ratio between minimal disclosure risk and minimal
information loss, and substantiate an availability of masking methods interconnected with microdata wavelet
transform.
Keywords: Statistical Disclosure Control, Microdata, Microfile, Masking Methods, Wavelet Transform.
1. Introduction
Last two decades active researches are conducted in an area of data mining, i.e. extracting hidden
patterns from data. Let us draw attention on an opposite direction – “disclosure limitation”. Classical
methods of information encryption or security (organizational, technical, with usual or electronic keys,
steganography etc.) are used for making data access hard or impossible. In this paper we consider such
transformation of a given to any user data sample that saves all main features of this sample, while ensuring
a disclosure control of confidential information.
A necessity of such data transformations emerges when users get access not only to analytical results (in
form of tables, plots, diagrams etc.), but also to original data. Lately such situation has become more
widespread when providing results of various statistical and sociological investigations. A thematic example
is the IPUMS-International project. While accomplishing this project data have been gathering from 130
censuses in 44 countries. The project database contains 279 millions personal records [1].
The usual approach for protecting the released data is to distort or to mask them in some way before
publication. The methods that attempt to perform such distortion are named as statistical disclosure control
methods.
We mark out three actual tasks of statistical disclosure control. The first one is an ensuring of individual
respondent anonymity. For example, suppose somebody knows only the range of ages, exact amount of
children and belonging to top-ranking officers. Then in the dataset it is possible to identify the record which
is related to the current President of Ukraine, because he has 5 kids.
The second task is to provide an ensuring of group anonymity. For example, it is necessary to exclude
the geographical location of cantonments from disclosure by calculating a concentration of military-aged
youth.
The third task is an ensuring of reverse transformation, i.e. transformation from masking data to original
data. It is needed when the reconstruction of primary sample is impossible, for example, because some
information has lost its actuality.
2. Problem Definition
+
Tel.: + 38067-3094309; fax: +38044-2419658
E-mail address: chertov@i.ua.
2009 International Symposium on Computing, Communication, and Control (ISCCC 2009)
Proc .of CSIT vol.1 (2011) © (2011) IACSIT Press, Singapore