An ontology-based record linkage method for textual microdata Sergio Martínez a, 1 , Aida Valls a and David Sánchez a a Department of Computer Science and Mathematics. Universitat Rovira i Virgili Intelligent Technologies for Advanced Knowledge Acquisition (ITAKA) research group Av. Països Catalans, 26. 43007. Tarragona, Catalonia (Spain) Abstract. Disclosure control is a critical aspect when publishing information from databases (i.e. microdata), because they store private information about individuals. The goal of a privacy-preserving method is to avoid the re- identification of individuals from the published data. Several disclosure control methods to mask published data have been developed. To evaluate the quality of the anonymization process, disclosure risk methods measure the capacity of an intruder to link the records in the original dataset with those of in the masked one. Record linkage methods proposed in the literature are focused only on numerical and ordinal data. In this paper we present a new record linkage method for textual data that exploits the semantics of the values using ontologies. It relies on the theory of semantic similarity to propose linkages between the original and the masked records. The paper compares the results obtained with our method with the ones given by a traditional non-semantic approach. Evaluation shows that the semantic-based record linkage is able to better evaluate the disclosure risk of masking methods dealing with textual microdata. Keywords. Privacy protection, disclosure risk, knowledge-based systems, ontologies, semantic similarity. Introduction Social and economic studies require large and detailed data (i.e., microdata) about individuals. Statistical agencies gather this data from polls, questionnaires or usage logs, which, before made public, must be properly anonymized. Assuring the protection of the identity of the individuals is a critical aspect, because, in many situations, datasets contain personal confidential information. The goal of statistical disclosure methods is to avoid that an intruder re-identifies an individual from the published data, associating or retrieving his confidential information. The anonymization methods for the protection of privacy of a person are the current means to protect the identity of that person [21]. Several anonymization techniques, based on masking original data, have been developed to minimize the re-identification risk [4]. 1 Corresponding Author.