Real-time estimation of COVID-19 cases using
machine learning and mathematical models - The
case of India
Pratima Kumari
Department of Computer Science & Engineering
Indian Institute of Technology
Roorkee, India
pkumari@cs.iitr.ac.in
Durga Toshniwal
Department of Computer Science & Engineering
Indian Institute of Technology
Roorkee, India
durgatoshniwal@gmail.com
Abstract—COVID-19 pandemic has stressed out the economy
and resources of major countries across the world due to its high
infection and transmission rate. The count of COVID-19 cases
skyrocketed in the past few days, which creates immense pressure
on health officials and governments. Therefore, prediction models
to determine the number of new infections are urgently required
in such grave times. In the present study, a machine learning
technique, namely artificial neural network (ANN) is proposed
to forecast the COVID-19 outbreak in India, for the first time.
Moreover, in our study, we have additionally attempted to use a
mathematical curve fitting model to ascertain the performance of
the proposed ANN-based machine learning model. In addition,
the impact of preventive measures such as lockdown and social
distancing on the spread of COVID-19 is also analyzed by
estimating the growth of the epidemic under different trans-
mission rates. Moreover, a comparison between the proposed
and existing COVID-19 prediction models is also demonstrated.
Intriguingly, the proposed model is found to be highly accurate
in estimating the growth of COVID-19 related parameters with
the lowest MAPE values (cumulative confirmed cases (3.981),
daily confirmed cases (4.173) and cumulative deceased cases
(4.413)). Hence, the present study can assist the health officers
and administration in getting prepared with the beforehand
arrangement of the required resources and medical facilities.
Index Terms—Covid-19, infectious disease, prediction, machine
learning, artificial neural network
I. I NTRODUCTION
COVID-19 or the novel coronavirus disease caused by
severe acute respiratory syndrome coronavirus 2 (SARS-CoV-
2) is a highly communicable infection and declared a global
pandemic by the World Health Organization (WHO). COVID-
19 belongs to a family of zoonotic coronaviruses, similar to
Middle East Respiratory Syndrome Coronavirus (MERS-CoV)
and SARS-CoV-2 seen in past decades. The virus has high
infectivity and shows a high morbidity rate on elderly people
and those suffering from severe diseases such as asthma,
cancer and diabetes [1]. As of July 20, 2020, more than 14M
confirmed cases and 609,279 deaths had been reported world-
wide. Some European countries, including Spain (307,335
cases), UK (294,792 cases) and more recently, the United
States of America (3,898,550 cases), are the few most affected
countries by this global health crisis [2]. The worsening
conditions warrant immediate implementation of containment
strategies to stop the spread. Since there is no treatment and
medicine available for the virus, effective planning of health
services and infrastructures are highly required. Administrators
and public health officers are under immense pressure to
manage the accommodation of the patients having COVID-
19 symptoms. For this reason, some prediction tools must be
needed to estimate the possible new COVID-19 cases in the
near future for organizing the resources and materials required
to handle the outbreak. Public health officers may utilize the
beforehand prediction of disease for the effective and prompt
arrangement of the resources necessary for medical treatment
to overcome the pandemic [3].
In this regard, the community of mathematicians and sci-
entists working in artificial intelligence are coming forward
to develop accurate prediction models to predict COVID-19
cases in different countries. Recently, Zhao et al. [3] developed
a mathematical model to forecast COVID-19 cases in the
first half of January in China. Similarly, Tang et al. [1]
proposed a mathematical model to determine the transmission
rate of COVID-19 to predict the COVID-19 confirmed cases
in the next seven days. Roosa et al. [4] applied a generalized
logistic growth model to determine the count of cumulative
confirmed cases in China from 5th to 24th February, 2020.
In addition, various statistical methods such as Autoregressive
Integrate Moving Average (ARIMA), Moving Average (MA),
Auto Regressive (AR), multivariate linear regression have been
used to predict COVID-19 cases. For instance, Ceylan [5]
applied ARIMA to predict the prevalence of COVID-19 in the
three most affected European countries, including Italy, Spain
and France. Dehesh et al. [6] developed the ARIMA model
to predict confirmed COVID-19 cases in different countries.
Similarly, Benvenuto et al. [7] used Johns Hopkins epidemi-
ological data to determine the prevalence and incidence trend
of COVID-19 by applying ARIMA model for Italy. However,
these reported statistical methods are linear models that cannot
capture the non-linearity in data. Moreover, these models
utilize regression without modeling non-linear functions, and
hence, can’t learn the dynamics of the transmission rate
15th (IEEE) International Conference on Industrial and Information Systems (ICIIS) 2020
Part No: CFP2058A-ART; 369
2020 IEEE 15th International Conference on Industrial and Information Systems (ICIIS) | 978-1-7281-8524-8/20/$31.00 ©2020 IEEE | DOI: 10.1109/ICIIS51140.2020.9342735
© IEEE 2021. This article is free to access and download, along with rights for full text and data mining, re-use and analysis.