339 Statistical Disclosure Control Methods for Microdata Oleg Chertov 1+ and Anastasiya Pilipyuk 1 1 National Technical University “Kyiv Polytechnic Institute”, Kyiv, Ukraine Abstract. In this paper we formulate three basic tasks of statistical disclosure control for microdata, analyze existent methods for achieving optimal ratio between minimal disclosure risk and minimal information loss, and substantiate an availability of masking methods interconnected with microdata wavelet transform. Keywords: Statistical Disclosure Control, Microdata, Microfile, Masking Methods, Wavelet Transform. 1. Introduction Last two decades active researches are conducted in an area of data mining, i.e. extracting hidden patterns from data. Let us draw attention on an opposite direction – “disclosure limitation”. Classical methods of information encryption or security (organizational, technical, with usual or electronic keys, steganography etc.) are used for making data access hard or impossible. In this paper we consider such transformation of a given to any user data sample that saves all main features of this sample, while ensuring a disclosure control of confidential information. A necessity of such data transformations emerges when users get access not only to analytical results (in form of tables, plots, diagrams etc.), but also to original data. Lately such situation has become more widespread when providing results of various statistical and sociological investigations. A thematic example is the IPUMS-International project. While accomplishing this project data have been gathering from 130 censuses in 44 countries. The project database contains 279 millions personal records [1]. The usual approach for protecting the released data is to distort or to mask them in some way before publication. The methods that attempt to perform such distortion are named as statistical disclosure control methods. We mark out three actual tasks of statistical disclosure control. The first one is an ensuring of individual respondent anonymity. For example, suppose somebody knows only the range of ages, exact amount of children and belonging to top-ranking officers. Then in the dataset it is possible to identify the record which is related to the current President of Ukraine, because he has 5 kids. The second task is to provide an ensuring of group anonymity. For example, it is necessary to exclude the geographical location of cantonments from disclosure by calculating a concentration of military-aged youth. The third task is an ensuring of reverse transformation, i.e. transformation from masking data to original data. It is needed when the reconstruction of primary sample is impossible, for example, because some information has lost its actuality. 2. Problem Definition + Tel.: + 38067-3094309; fax: +38044-2419658 E-mail address: chertov@i.ua. 2009 International Symposium on Computing, Communication, and Control (ISCCC 2009) Proc .of CSIT vol.1 (2011) © (2011) IACSIT Press, Singapore