A Comparative Analysis of Different Regression Models on Predicting the Spread of Covid-19 in India Mrittika Chakraborty Dept of Computer Sc. & Engg. Univeristy of Kalyani Kalyani, India mrittikachakraborty@gmail.com Anirban Mukhopadhyay, SMIEEE Dept of Computer Sc. & Engg. Univeristy of Kalyani Kalyani, India anirban@klyuniv.ac.in Ujjwal Maulik, FIEEE Dept of Computer Sc. & Engg. Jadavpur Univeristy Kolkata, India umaulik@cse.jdvu.ac.in Abstract—According to the World Health Organization (WHO) Situation Reports of Corona Virus Disease(Covid-19), as on 15 th May 2020, India has 81,970 totals confirmed cases, 2649 total deaths and is still within the limit of community transmission phase. In this study, we analyze the spread of the disease and the fatalities caused up to 15 th May 2020, as per the data obtained. A granular computing based regression model, namely Granular Box Regression is used along with Linear Regression for comparative analysis to study the increase in the number of confirmed cases and deaths based on days and an increase in the number of samples tested per day. A separate analysis is also conducted to evaluate the performance of Polynomial Regression on the same dataset. The performance of the different models has been evaluated using R-squared, Mean Absolute Error, Root Mean Squared Error, and Mean Squared Error values. Index Terms—Covid-19, coronavirus, Linear regression, Gran- ular Box Regression (GBR), Polynomial regression. I. I NTRODUCTION The ongoing pandemic of coronavirus disease in 2019 (Covid-19) was first reported in Wuhan, China in December 2019. The coronavirus disease is caused by severe acute respiratory syndrome coronavirus 2 (SARS CoV 2) and is primarily spread among people in proximity (within about 6 feet) most often via droplets produced by sneezing, coughing, talking. As the reports of the World Health Organization (WHO), no licensed vaccines are yet available. Hence, the key public health strategies such as surveillance, contract tracing, isolation and quarantine (wherever necessary) become the core methods to combat the deadly disease. Machine learning tools have always played a vital role in healthcare analytics especially in risk predictions of chronic diseases. Supervised learning and novel biclustering approach to association mining rules have been used to study the interactions between human immunodeficiency virus (HIV-1) and human proteins [1] [2]. Disease predictions and big data driven crisis analyses using machine learning methodologies have been conducted in recent times [3] [4]. Large-scale prediction of host genes associated with infectious diseases have also been studied using Deep Neural Network (DNN) model based approach [5]. Real-time epidemiology based forecasting have been utilized for studying the most preva- lent influenza outbreaks [6]. Nsoesie et al. [7] provided a systematic review of approaches useful for forecasting the dynamics of influenza outbreaks, which could be used for decision making regarding the allocation of health resources. Given the Covid-19 disease spread being declared a global pandemic, crisis management in the field of healthcare, using prediction algorithms have become an inevitable aspect of surveillance across the country as well as worldwide. Some initial studies have been conducted on the spread of Covid-19 with its potential effects on human lives generating anxiety disorders [8] and impacts of the epidemic [9]. The association between severe Covid-19 infection with Diabetes Mellitus and with effects on the mortality rate has also been studied in [10] in the recent times. However, through regression analytics, we can identify the future threats of an increase in the numbers of patients, forecasting groups of patients more potent to the spread, necessities in the equipment supply across the medical wards including isolation beds, Personal Protective Equipment (PPE) kits, and ventilators. Among the different machine learning algorithms, vari- ous prediction rules, Bayesian network, regression models have been used extensively for the study of such pandemic outbreaks as in [7] [11]. In this study, we have performed time series-based predictions on some datasets based on the Covid-19 data collected using India Covid-19 Tracker Data. Linear models have been used to for simpler evaluations and intelligible interpretations. Linear Regression model along with Granular Box Regression (GBR) model as in [12] have been compared using the datasets to provide the best fit model. We have also studied the performance of the Polynomial Regression model on the same datasets. As no effective vaccine has yet been developed for this disease, it is evident that to flatten the curve of the rise in the spread must be the key objective of managing this crisis. The main objective lies in analyzing the probable spread of the disease in the country while choosing the best predictor model. Another objective is to find the appropriate regression 2020 IEEE 5th International Conference on Computing Communication and Automation (ICCCA) Galgotias University, Greater Noida, UP, India. Oct 30-31, 2020 © IEEE 2020. This article is free to access and download, along with rights for full text and data mining, re-use and analysis 519