Inference Control in a Diabetes Data Set Algorithm Georgios FERETZAKIS 1 , Konstantinos MITROPOULOS, Dimitris KALLES and Vassilios S. VERYKIOS School of Science and Technology, Hellenic Open University, Patras 263 35, Greece; georgios.feretzakis@ac.eap.gr; kmitrop@otenet.gr; kalles@eap.gr; verykios@eap.gr Abstract. Data sharing among different entities in the healthcare domain has become an increasingly common practice, where each entity would most likely want to prevent indirect data disclosure via inference channels. The Local Distortion Hiding (LDH) algorithm has been developed to protect sensitive decision tree (DT) rules, which are chosen not to be disclosed when DT construction techniques are applied to the data. This article presents eight experiments using a Java-based prototype that implements the LDH algorithm in a diabetes data set. Our experiments test the ability of the LDH algorithm in two ways, firstly in inference control and secondly in maintaining the structure and the performance metrics of the resulting DT. Our experiments on hiding eight terminal nodes in a diabetes data set using a Java-based prototype that implements the LDH algorithm, yield satisfactory results. Keywords. Inference control; data security; privacy-preserving; machine learning 1. Introduction and Background The healthcare sector is being digitally transformed by technological advances in medical information systems, electronic medical records, wearables, and mobile devices. The increase in the amount of global healthcare data and the advancements in the machine learning (ML) and data analytics field allow researchers and clinicians to extract and visualize large-scale medical data in a new spectrum [1]. The Internet facilitates the transfer and the exchange of these data, as well as the delivery of healthcare services and applications, linking this way successfully patients and healthcare providers. While such ecosystems promise a future for widely accessible and more innovative healthcare, the privacy of patients, physicians, nurses, and health care professionals is today more than ever of concern [2]. Data privacy is a critical issue in health informatics, particularly when analyzing datasets collected from various sources, such as health care providers, insurance companies, pharmaceutical companies, and research institutions. Data sharing among different entities in the healthcare domain has become an increasingly common practice, where each entity would most likely want to prevent indirect data disclosure via inference channels. The extraction of knowledge from patients’ personal data for research purposes should be made with safety and absolute privacy. Privacy-preserving 1 Corresponding Author, Georgios FERETZAKIS, PhD; E-mail: georgios.feretzakis@ac.eap.gr Using a Java-Based Prototype of LDH Informatics and Technology in Clinical Care and Public Health J. Mantas et al. (Eds.) © 2022 The authors and IOS Press. This article is published online with Open Access by IOS Press and distributed under the terms of the Creative Commons Attribution Non-Commercial License 4.0 (CC BY-NC 4.0). doi:10.3233/SHTI210946 414