Inference Control in a Diabetes Data Set
Algorithm
Georgios FERETZAKIS
1
, Konstantinos MITROPOULOS, Dimitris KALLES and
Vassilios S. VERYKIOS
School of Science and Technology, Hellenic Open University, Patras 263 35, Greece;
georgios.feretzakis@ac.eap.gr; kmitrop@otenet.gr; kalles@eap.gr; verykios@eap.gr
Abstract. Data sharing among different entities in the healthcare domain has
become an increasingly common practice, where each entity would most likely want
to prevent indirect data disclosure via inference channels. The Local Distortion
Hiding (LDH) algorithm has been developed to protect sensitive decision tree (DT)
rules, which are chosen not to be disclosed when DT construction techniques are
applied to the data. This article presents eight experiments using a Java-based
prototype that implements the LDH algorithm in a diabetes data set. Our
experiments test the ability of the LDH algorithm in two ways, firstly in inference
control and secondly in maintaining the structure and the performance metrics of the
resulting DT. Our experiments on hiding eight terminal nodes in a diabetes data set
using a Java-based prototype that implements the LDH algorithm, yield satisfactory
results.
Keywords. Inference control; data security; privacy-preserving; machine learning
1. Introduction and Background
The healthcare sector is being digitally transformed by technological advances in
medical information systems, electronic medical records, wearables, and mobile devices.
The increase in the amount of global healthcare data and the advancements in the
machine learning (ML) and data analytics field allow researchers and clinicians to extract
and visualize large-scale medical data in a new spectrum [1]. The Internet facilitates the
transfer and the exchange of these data, as well as the delivery of healthcare services and
applications, linking this way successfully patients and healthcare providers. While such
ecosystems promise a future for widely accessible and more innovative healthcare, the
privacy of patients, physicians, nurses, and health care professionals is today more than
ever of concern [2]. Data privacy is a critical issue in health informatics, particularly
when analyzing datasets collected from various sources, such as health care providers,
insurance companies, pharmaceutical companies, and research institutions. Data sharing
among different entities in the healthcare domain has become an increasingly common
practice, where each entity would most likely want to prevent indirect data disclosure via
inference channels. The extraction of knowledge from patients’ personal data for
research purposes should be made with safety and absolute privacy. Privacy-preserving
1
Corresponding Author, Georgios FERETZAKIS, PhD; E-mail: georgios.feretzakis@ac.eap.gr
Using a Java-Based Prototype of LDH
Informatics and Technology in Clinical Care and Public Health
J. Mantas et al. (Eds.)
© 2022 The authors and IOS Press.
This article is published online with Open Access by IOS Press and distributed under the terms
of the Creative Commons Attribution Non-Commercial License 4.0 (CC BY-NC 4.0).
doi:10.3233/SHTI210946
414