A Review of Frequency Table Disclosure Control from a Microdata Perspective Alexander Latenko 1 , Mortaza S. Bargh 2 , Susan van den Braak 3 , Marco Vink 4 1-4 Research and Documentation Centre, Ministry of Justice and Security, The Hague, The Netherlands 2 Rotterdam University of Applied science, Research Center Creating 010, Rotterdam, The Netherlands Email: 1 a.latenko@wodc.nl 2 m.shoae.bargh@wodc.nl 3 s.w.van.den.braak@wodc.nl 4 m.e.vink@wodc.nl Abstract—Protecting personal data is a key requirement for properly sharing and opening data. With growing concerns regarding privacy, it is important to ensure that the personal data of individuals is not compromised or made public in Open Data initiatives. For the most part, the personal data protection fields for microdata and tabular data have been researched separately. This separation has caused both fields to have much overlapping research, particularly concerning the privacy and utility of the respective data types. This overlapping research, however, has not been well integrated between the fields. Recently, there have been developments and improvements for protecting microdata that are not being applied to the field of tabular data protection. In this work, the association between microdata and tabular data is formalized and used to link the personal data disclosure risks and the personal data protection models that can be applied to both microdata and tabular data. Keywords–Data Protection; Disclosure Scenarios; Frequency Tables; Statistical Disclosure Control. I. I NTRODUCTION Within the process of opening and sharing data, Statistical Disclosure Control (SDC) is applied to reduce the risk of privacy disclosures for individuals while preserving the qual- ity and utility of the data. Minimizing the risk of privacy disclosures is an essential step that needs to be performed in order to adhere to privacy regulations, such as the EU’s General Data Protection Regulation (GDPR). As essential as it is for a data controller, i.e., the entity that opens the data, to provide sufficient guarantees of privacy, it is perhaps just as essential for a data user to be provided with similar guarantees of the quality of data. There are different reasons for opening or disseminating data, including, among others, improving transparency and enabling (scientific) research. Census tables are an example of opening data for transparency, where the information in those tables influences public perception and therefore should be as informative as possible. Opening data does not only facilitate research, but has become increasingly necessary for academic work to be acceptable for publication in certain journals [1]. SDC solutions are non-trivial in practical settings as the identification of potential sources of disclosure is a difficult task. This becomes clear from the recent cases where data subjects, the individuals present in the data, were first de- identified (anonymized), but were later re-identified by re- searchers [2]. Even when SDC methods have been applied on data, re-identification is still sometimes possible. Numerous cases have been discovered, including the infamous cases of disclosure in the microdata of taxi rides from NYC [3] and tabular data containing sensitive health information [4]. To prevent re-identification, an initial identification of the sources and causes of personal data disclosures is required. As such, this work contributes by providing a taxonomy for data disclosures when opening tabular data. Models such as t- closeness [5] and differential privacy [6] have been introduced to provide certain levels of privacy. Such models have mainly been introduced for protecting microdata. However, tabular data and microdata are closely related. We fundamentally formalize the relation between the two data types. This for- malization makes it possible to evaluate the relation between SDC models developed for protecting microdata sets and those developed for protecting tabular data sets. Thus, this work improves the unification of the SDC methods and models developed for microdata and tabular data sets. The contribution on this work is focused on frequency tables, which is the most general type of tabular data. The disclosure risks and privacy models for frequency tables mainly hold for other types of tabular data, such as magnitude tables [7]. However, the disclosure risks that affect other specific types of tabular data are not considered in this work. To the best of our knowledge, this is the first work that aims at unifying the privacy models for microdata and those for tabular data, allowing for comparisons between the privacy models. The rest of this work consists of the formalization of microdata and tabular data, specifically frequency tables, in Section II. The concept of disclosure is introduced in Section III, followed by the attacks that cause personal data disclosures in Section IV. An overview of privacy models is presented in Section V. Lastly, Section VI concludes this work and discusses possible future work. II. DATA ASSOCIATION In order to unambiguously describe the scenarios where personal data disclosures may take place for tabular data sets, the concept of microdata and tabular data are formalized in this section. A. Microdata A microdata set DS M comprises N rows, or records, denoted by x n , where n = 1,...,N and every record x n corresponds to one individual. Further, every record x n comprises D attributes. An attribute is denoted by a i , where i :1,...,D. An attribute a i has an associated domain of nominal or ordinal values A i . Domain A = A 1 ×A 2 ×... ×A D denotes the super domain, which contains all attribute values in DS M . Every record x n is defined over A, consisting of attribute values x n 1 ,x n 2 , ....,x n D , where x n i ∈ A i , i :1,...,D. Table I is an example of a microdata table. In the SDC literature for microdata, the set of attributes {a 1 ,a 2 ,...,a D } are generally divided into four disjoint sets called: explicit identifiers, quasi identifiers, sensitive attributes, and non-sensitive attributes. Explicit Identifiers (EIDs) refer 12 Copyright (c) IARIA, 2020. ISBN: 978-1-61208-760-3 ICDS 2020 : The Fourteenth International Conference on Digital Society