electronics
Article
Generating Synthetic ECGs Using GANs for Anonymizing
Healthcare Data
Esteban Piacentino
†
, Alvaro Guarner
†
and Cecilio Angulo *
Citation: Piacentino, E.; Guarner, A.;
Angulo, C. Generating Synthetic
ECGs Using GANs for Anonymizing
Healthcare Data. Electronics 2021, 10,
389. https://doi.org/10.3390/
electronics10040389
Academic Editors: Nicola Francesco
Lopomo and Pawel Strumillo
Received: 10 November 2020
Accepted: 30 January 2021
Published: 5 February 2021
Publisher’s Note: MDPI stays neu-
tral with regard to jurisdictional clai-
ms in published maps and institutio-
nal affiliations.
Copyright: © 2021 by the authors. Li-
censee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and con-
ditions of the Creative Commons At-
tribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
Intelligent Data Science and Artificial Intelligence Research Centre, Universitat Politècnica de Catalunya,
08034 Barcelona, Spain; estebanpiacentino@gmail.com (E.P.); alvaroguarner@hotmail.com (A.G.)
* Correspondence: cecilio.angulo@upc.edu
† These authors contributed equally to this work.
Abstract: In personalized healthcare, an ecosystem for the manipulation of reliable and safe private
data should be orchestrated. This paper describes an approach for the generation of synthetic
electrocardiograms (ECGs) based on Generative Adversarial Networks (GANs) with the objective of
anonymizing users’ information for privacy issues. This is intended to create valuable data that can
be used both in educational and research areas, while avoiding the risk of a sensitive data leakage.
As GANs are mainly exploited on images and video frames, we are proposing general raw data
processing after transformation into an image, so it can be managed through a GAN, then decoded
back to the original data domain. The feasibility of our transformation and processing hypothesis is
primarily demonstrated. Next, from the proposed procedure, main drawbacks for each step in the
procedure are addressed for the particular case of ECGs. Hence, a novel research pathway on health
data anonymization using GANs is opened and further straightforward developments are expected.
Keywords: GAN; ECG; anonymization; healthcare data; sensors; data transformation
1. Introduction
In recent years there has been a huge proliferation of solutions that store and process
personal health data and infer knowledge, from mobile health apps to smart wearable
sensors [1]. In Reference [2], a proposal is introduced to orchestrate an ecosystem of
manipulation of reliable and safe data, applied to the field of health, proposing the creation
of digital twins for personalized healthcare [3].
One of the elements to be considered in health-related projects is data privacy for
ethical issues. Sources of medical data in health services are causing important concerns,
the main one being privacy and legal issues when sharing and reporting health information
of patients. However, an accurate diagnosis depends on the quantity and quality of the
information about a patient, as well as extensive medical knowledge. In this context,
anonymization arises as a tool to mitigate the risks of obtaining and massively processing
personal data [4]. We propose GAN-based anonymization [5] of private health data, so a
seedbed would be obtained from the training data that allow not only to capture informa-
tion from the original data, but to generate new information with a similar behaviour.
Generative Adversarial Network (GAN) algorithms have arisen in 2014 [6] and, since
then, have been highlighted as potential alternatives for data augmentation [7] and missing
data problems [8], among other applications, due to their outstanding capabilities on
generating realistic data instances, mostly images. Following these applications, a question
raised about the feasibility of using GAN systems to generate synthetic data, not necessarily
images, that imitates the attributes of a private health database. If possible, this generated
machine would be a very useful tool as it is enabling unlimited similar-to-the-original
data without compromising the privacy of the original elements. The applications of this
tool could range from educational purposes to scientific simulations and investigations, as
sensitive data from any field could be available without a risk of private data leakage.
Electronics 2021, 10, 389. https://doi.org/10.3390/electronics10040389 https://www.mdpi.com/journal/electronics