electronics Article Generating Synthetic ECGs Using GANs for Anonymizing Healthcare Data Esteban Piacentino , Alvaro Guarner and Cecilio Angulo *   Citation: Piacentino, E.; Guarner, A.; Angulo, C. Generating Synthetic ECGs Using GANs for Anonymizing Healthcare Data. Electronics 2021, 10, 389. https://doi.org/10.3390/ electronics10040389 Academic Editors: Nicola Francesco Lopomo and Pawel Strumillo Received: 10 November 2020 Accepted: 30 January 2021 Published: 5 February 2021 Publisher’s Note: MDPI stays neu- tral with regard to jurisdictional clai- ms in published maps and institutio- nal affiliations. Copyright: © 2021 by the authors. Li- censee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and con- ditions of the Creative Commons At- tribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/). Intelligent Data Science and Artificial Intelligence Research Centre, Universitat Politècnica de Catalunya, 08034 Barcelona, Spain; estebanpiacentino@gmail.com (E.P.); alvaroguarner@hotmail.com (A.G.) * Correspondence: cecilio.angulo@upc.edu † These authors contributed equally to this work. Abstract: In personalized healthcare, an ecosystem for the manipulation of reliable and safe private data should be orchestrated. This paper describes an approach for the generation of synthetic electrocardiograms (ECGs) based on Generative Adversarial Networks (GANs) with the objective of anonymizing users’ information for privacy issues. This is intended to create valuable data that can be used both in educational and research areas, while avoiding the risk of a sensitive data leakage. As GANs are mainly exploited on images and video frames, we are proposing general raw data processing after transformation into an image, so it can be managed through a GAN, then decoded back to the original data domain. The feasibility of our transformation and processing hypothesis is primarily demonstrated. Next, from the proposed procedure, main drawbacks for each step in the procedure are addressed for the particular case of ECGs. Hence, a novel research pathway on health data anonymization using GANs is opened and further straightforward developments are expected. Keywords: GAN; ECG; anonymization; healthcare data; sensors; data transformation 1. Introduction In recent years there has been a huge proliferation of solutions that store and process personal health data and infer knowledge, from mobile health apps to smart wearable sensors [1]. In Reference [2], a proposal is introduced to orchestrate an ecosystem of manipulation of reliable and safe data, applied to the field of health, proposing the creation of digital twins for personalized healthcare [3]. One of the elements to be considered in health-related projects is data privacy for ethical issues. Sources of medical data in health services are causing important concerns, the main one being privacy and legal issues when sharing and reporting health information of patients. However, an accurate diagnosis depends on the quantity and quality of the information about a patient, as well as extensive medical knowledge. In this context, anonymization arises as a tool to mitigate the risks of obtaining and massively processing personal data [4]. We propose GAN-based anonymization [5] of private health data, so a seedbed would be obtained from the training data that allow not only to capture informa- tion from the original data, but to generate new information with a similar behaviour. Generative Adversarial Network (GAN) algorithms have arisen in 2014 [6] and, since then, have been highlighted as potential alternatives for data augmentation [7] and missing data problems [8], among other applications, due to their outstanding capabilities on generating realistic data instances, mostly images. Following these applications, a question raised about the feasibility of using GAN systems to generate synthetic data, not necessarily images, that imitates the attributes of a private health database. If possible, this generated machine would be a very useful tool as it is enabling unlimited similar-to-the-original data without compromising the privacy of the original elements. The applications of this tool could range from educational purposes to scientific simulations and investigations, as sensitive data from any field could be available without a risk of private data leakage. Electronics 2021, 10, 389. https://doi.org/10.3390/electronics10040389 https://www.mdpi.com/journal/electronics