information Article Early Stage Identification of COVID-19 Patients in Mexico Using Machine Learning: A Case Study for the Tijuana General Hospital Cristián Castillo-Olea 1, * , Roberto Conte-Galván 1 , Clemente Zuñiga 2 , Alexandra Siono 3 , Angelica Huerta 4 , Ornela Bardhi 5 and Eric Ortiz 6   Citation: Castillo-Olea, C.; Conte-Galván, R.; Zuñiga, C.; Siono, A.; Huerta, A.; Bardhi, O.; Ortiz, E. Early Stage Identification of COVID-19 Patients in Mexico Using Machine Learning: A Case Study for the Tijuana General Hospital. Information 2021, 12, 490. https:// doi.org/10.3390/info12120490 Academic Editors: Sidong Liu, Cristián Castillo Olea and Shlomo Berkovsky Received: 13 October 2021 Accepted: 22 November 2021 Published: 24 November 2021 Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affil- iations. Copyright: © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/). 1 Ensenada Center for Scientifc Research and Higher Education, Ensenada 22860, Mexico; conte@cicese.mx 2 Tijuana General Hospital, Tijuana 22000, Mexico; drclementezuniga@gmail.com 3 Faculty of Engineering, CETYS University, Mexicali 21259, Mexico; alexandra.siono@cetys.edu.mx 4 Faculty of Medicine and Psychology, Autonomous University of Baja California, Mexicali 21100, Mexico; angelica.huerta.d@gmail.com 5 Independent Researcher, 1001 Tirana, Albania; alenroidhrab@gmail.com 6 comeMed Teleconsulting, Colonia Roma, Mexico City 6700, Mexico; dr.ericortiz.oncomed@gmail.com * Correspondence: cristian.castillo2@gmail.com; Tel.: +52-5574302237 Abstract: Background: The current pandemic caused by SARS-CoV-2 is an acute illness of global concern. SARS-CoV-2 is an infectious disease caused by a recently discovered coronavirus. Most people who get sick from COVID-19 experience either mild, moderate, or severe symptoms. In order to help make quick decisions regarding treatment and isolation needs, it is useful to determine which significant variables indicate infection cases in the population served by the Tijuana General Hospital (Hospital General de Tijuana). An Artificial Intelligence (Machine Learning) mathematical model was developed in order to identify early-stage significant variables in COVID-19 patients. Methods: The individual characteristics of the study subjects included age, gender, age group, symptoms, comorbidities, diagnosis, and outcomes. A mathematical model that uses supervised learning algorithms, allowing the identification of the significant variables that predict the diagnosis of COVID-19 with high precision, was developed. Results: Automatic algorithms were used to analyze the data: for Systolic Arterial Hypertension (SAH), the Logistic Regression algorithm showed results of 91.0% in area under ROC (AUC), 80% accuracy (CA), 80% F1 and 80% Recall, and 80.1% precision for the selected variables, while for Diabetes Mellitus (DM) with the Logistic Regression algorithm it obtained 91.2% AUC, 89.2% accuracy, 88.8% F1, 89.7% precision, and 89.2% recall for the selected variables. The neural network algorithm showed better results for patients with Obesity, obtaining 83.4% AUC, 91.4% accuracy, 89.9% F1, 90.6% precision, and 91.4% recall. Conclusions: Statistical analyses revealed that the significant predictive symptoms in patients with SAH, DM, and Obesity were more substantial in fatigue and myalgias/arthralgias. In contrast, the third dominant symptom in people with SAH and DM was odynophagia. Keywords: machine learning; COVID-19; identification 1. Introduction A novel coronavirus, known as Severe Acute Respiratory Syndrome (SARS-CoV-2), was identified in December 2019 as the cause of a respiratory illness called Coronavirus Disease 2019, or COVID-19 [1]. The origin of this virus is not yet confirmed, but an analysis of its genetic sequence suggests it is phylogenetically related to bat viruses similar to SARS (severe acute respiratory syndrome), making bats a possible key reservoir [2]. Symptoms of COVID-19 infection appear after an incubation period of approximately 5.2 days [3]. The period from the onset of COVID-19 symptoms to death ranges from 6 to 41 days with a median of 14 days [4]. This period depends largely on the age and the state of the patient’s immune system [4]. Information 2021, 12, 490. https://doi.org/10.3390/info12120490 https://www.mdpi.com/journal/information