A Comparative Analysis of Different Regression
Models on Predicting the Spread of Covid-19 in
India
Mrittika Chakraborty
Dept of Computer Sc. & Engg.
Univeristy of Kalyani
Kalyani, India
mrittikachakraborty@gmail.com
Anirban Mukhopadhyay, SMIEEE
Dept of Computer Sc. & Engg.
Univeristy of Kalyani
Kalyani, India
anirban@klyuniv.ac.in
Ujjwal Maulik, FIEEE
Dept of Computer Sc. & Engg.
Jadavpur Univeristy
Kolkata, India
umaulik@cse.jdvu.ac.in
Abstract—According to the World Health Organization
(WHO) Situation Reports of Corona Virus Disease(Covid-19),
as on 15
th
May 2020, India has 81,970 totals confirmed cases,
2649 total deaths and is still within the limit of community
transmission phase. In this study, we analyze the spread of
the disease and the fatalities caused up to 15
th
May 2020, as
per the data obtained. A granular computing based regression
model, namely Granular Box Regression is used along with
Linear Regression for comparative analysis to study the increase
in the number of confirmed cases and deaths based on days
and an increase in the number of samples tested per day. A
separate analysis is also conducted to evaluate the performance
of Polynomial Regression on the same dataset. The performance
of the different models has been evaluated using R-squared, Mean
Absolute Error, Root Mean Squared Error, and Mean Squared
Error values.
Index Terms—Covid-19, coronavirus, Linear regression, Gran-
ular Box Regression (GBR), Polynomial regression.
I. I NTRODUCTION
The ongoing pandemic of coronavirus disease in 2019
(Covid-19) was first reported in Wuhan, China in December
2019. The coronavirus disease is caused by severe acute
respiratory syndrome coronavirus 2 (SARS CoV 2) and is
primarily spread among people in proximity (within about 6
feet) most often via droplets produced by sneezing, coughing,
talking. As the reports of the World Health Organization
(WHO), no licensed vaccines are yet available. Hence, the key
public health strategies such as surveillance, contract tracing,
isolation and quarantine (wherever necessary) become the core
methods to combat the deadly disease.
Machine learning tools have always played a vital role in
healthcare analytics especially in risk predictions of chronic
diseases. Supervised learning and novel biclustering approach
to association mining rules have been used to study the
interactions between human immunodeficiency virus (HIV-1)
and human proteins [1] [2]. Disease predictions and big data
driven crisis analyses using machine learning methodologies
have been conducted in recent times [3] [4]. Large-scale
prediction of host genes associated with infectious diseases
have also been studied using Deep Neural Network (DNN)
model based approach [5]. Real-time epidemiology based
forecasting have been utilized for studying the most preva-
lent influenza outbreaks [6]. Nsoesie et al. [7] provided a
systematic review of approaches useful for forecasting the
dynamics of influenza outbreaks, which could be used for
decision making regarding the allocation of health resources.
Given the Covid-19 disease spread being declared a global
pandemic, crisis management in the field of healthcare, using
prediction algorithms have become an inevitable aspect of
surveillance across the country as well as worldwide. Some
initial studies have been conducted on the spread of Covid-19
with its potential effects on human lives generating anxiety
disorders [8] and impacts of the epidemic [9]. The association
between severe Covid-19 infection with Diabetes Mellitus and
with effects on the mortality rate has also been studied in [10]
in the recent times. However, through regression analytics, we
can identify the future threats of an increase in the numbers
of patients, forecasting groups of patients more potent to the
spread, necessities in the equipment supply across the medical
wards including isolation beds, Personal Protective Equipment
(PPE) kits, and ventilators.
Among the different machine learning algorithms, vari-
ous prediction rules, Bayesian network, regression models
have been used extensively for the study of such pandemic
outbreaks as in [7] [11]. In this study, we have performed
time series-based predictions on some datasets based on the
Covid-19 data collected using India Covid-19 Tracker Data.
Linear models have been used to for simpler evaluations
and intelligible interpretations. Linear Regression model along
with Granular Box Regression (GBR) model as in [12] have
been compared using the datasets to provide the best fit model.
We have also studied the performance of the Polynomial
Regression model on the same datasets.
As no effective vaccine has yet been developed for this
disease, it is evident that to flatten the curve of the rise in
the spread must be the key objective of managing this crisis.
The main objective lies in analyzing the probable spread of
the disease in the country while choosing the best predictor
model. Another objective is to find the appropriate regression
2020 IEEE 5th International Conference on Computing Communication and Automation (ICCCA)
Galgotias University, Greater Noida, UP, India. Oct 30-31, 2020
© IEEE 2020. This article is free to access and download, along with rights
for full text and data mining, re-use and analysis
519