Real-time estimation of COVID-19 cases using machine learning and mathematical models - The case of India Pratima Kumari Department of Computer Science & Engineering Indian Institute of Technology Roorkee, India pkumari@cs.iitr.ac.in Durga Toshniwal Department of Computer Science & Engineering Indian Institute of Technology Roorkee, India durgatoshniwal@gmail.com Abstract—COVID-19 pandemic has stressed out the economy and resources of major countries across the world due to its high infection and transmission rate. The count of COVID-19 cases skyrocketed in the past few days, which creates immense pressure on health officials and governments. Therefore, prediction models to determine the number of new infections are urgently required in such grave times. In the present study, a machine learning technique, namely artificial neural network (ANN) is proposed to forecast the COVID-19 outbreak in India, for the first time. Moreover, in our study, we have additionally attempted to use a mathematical curve fitting model to ascertain the performance of the proposed ANN-based machine learning model. In addition, the impact of preventive measures such as lockdown and social distancing on the spread of COVID-19 is also analyzed by estimating the growth of the epidemic under different trans- mission rates. Moreover, a comparison between the proposed and existing COVID-19 prediction models is also demonstrated. Intriguingly, the proposed model is found to be highly accurate in estimating the growth of COVID-19 related parameters with the lowest MAPE values (cumulative confirmed cases (3.981), daily confirmed cases (4.173) and cumulative deceased cases (4.413)). Hence, the present study can assist the health officers and administration in getting prepared with the beforehand arrangement of the required resources and medical facilities. Index Terms—Covid-19, infectious disease, prediction, machine learning, artificial neural network I. I NTRODUCTION COVID-19 or the novel coronavirus disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV- 2) is a highly communicable infection and declared a global pandemic by the World Health Organization (WHO). COVID- 19 belongs to a family of zoonotic coronaviruses, similar to Middle East Respiratory Syndrome Coronavirus (MERS-CoV) and SARS-CoV-2 seen in past decades. The virus has high infectivity and shows a high morbidity rate on elderly people and those suffering from severe diseases such as asthma, cancer and diabetes [1]. As of July 20, 2020, more than 14M confirmed cases and 609,279 deaths had been reported world- wide. Some European countries, including Spain (307,335 cases), UK (294,792 cases) and more recently, the United States of America (3,898,550 cases), are the few most affected countries by this global health crisis [2]. The worsening conditions warrant immediate implementation of containment strategies to stop the spread. Since there is no treatment and medicine available for the virus, effective planning of health services and infrastructures are highly required. Administrators and public health officers are under immense pressure to manage the accommodation of the patients having COVID- 19 symptoms. For this reason, some prediction tools must be needed to estimate the possible new COVID-19 cases in the near future for organizing the resources and materials required to handle the outbreak. Public health officers may utilize the beforehand prediction of disease for the effective and prompt arrangement of the resources necessary for medical treatment to overcome the pandemic [3]. In this regard, the community of mathematicians and sci- entists working in artificial intelligence are coming forward to develop accurate prediction models to predict COVID-19 cases in different countries. Recently, Zhao et al. [3] developed a mathematical model to forecast COVID-19 cases in the first half of January in China. Similarly, Tang et al. [1] proposed a mathematical model to determine the transmission rate of COVID-19 to predict the COVID-19 confirmed cases in the next seven days. Roosa et al. [4] applied a generalized logistic growth model to determine the count of cumulative confirmed cases in China from 5th to 24th February, 2020. In addition, various statistical methods such as Autoregressive Integrate Moving Average (ARIMA), Moving Average (MA), Auto Regressive (AR), multivariate linear regression have been used to predict COVID-19 cases. For instance, Ceylan [5] applied ARIMA to predict the prevalence of COVID-19 in the three most affected European countries, including Italy, Spain and France. Dehesh et al. [6] developed the ARIMA model to predict confirmed COVID-19 cases in different countries. Similarly, Benvenuto et al. [7] used Johns Hopkins epidemi- ological data to determine the prevalence and incidence trend of COVID-19 by applying ARIMA model for Italy. However, these reported statistical methods are linear models that cannot capture the non-linearity in data. Moreover, these models utilize regression without modeling non-linear functions, and hence, can’t learn the dynamics of the transmission rate 15th (IEEE) International Conference on Industrial and Information Systems (ICIIS) 2020 Part No: CFP2058A-ART; 369 2020 IEEE 15th International Conference on Industrial and Information Systems (ICIIS) | 978-1-7281-8524-8/20/$31.00 ©2020 IEEE | DOI: 10.1109/ICIIS51140.2020.9342735 © IEEE 2021. This article is free to access and download, along with rights for full text and data mining, re-use and analysis.