Elastic Net to Forecast COVID-19 Cases
Tim K Johnsen
Applied Data Science
San Jose State Univeristy
San Jose, CA, USA
tim.johnsen@sjsu.edu
Jerry Z Gao
Computer Engineering
San Jose State University
San Jose, CA, USA
jerry.gao@sjsu.edu
Abstract— Forecasting novel daily cases of COVID-19 is
crucial for medical, political, and other officials who handle day to
day, COVID-19 related logistics. Current machine learning
approaches, though robust in accuracy, can be either black boxes,
specific to one region, and/or hard to apply if the user has nominal
knowledge in machine learning and programing. This weakens the
integrity of otherwise robust machine learning methods, causing
them to not be utilized to their full potential. Thus, the presented
Elastic Net COVID-19 Forecaster, or EN-CoF for short, is
designed to provide an intuitive, generic, and easy to apply
forecaster. EN-CoF is a multi-linear regressor trained on time
series data to forecast number of novel daily COVID-19 cases. EN-
CoF maintains a high accuracy on par with more complex models
such as ARIMA and Bi-LSTM, while gaining the advantages of
transparency, generalization, and accessibility.
Keywords— COVID-19, Elastic Net, Machine Learning,
Artificial Intelligence, Time Series, Forecast
I. INTRODUCTION
The 2019 novel coronavirus (COVID-19) was first observed
and studied in China [1], and has since turned into a global
pandemic. Daily cases are hard to forecast because there is a
large uncertainty in confirmed cases, thus “predictions using
more complex models may not be more reliable compared to
using a simpler model” [2]. Susceptible-Exposed-Infectious-
Removed (SEIR) models have been used in [2] and [3] to predict
how policies will affect infection rates. Artificial Intelligence
and other models can forecast far into the future, but “with
sizable associated uncertainty” [4]. A more realistic approach is
to forecast into the near future, using a region’s recent record of
novel daily COVID-19 cases (i.e. time series data).
Reference [5] used time series data to forecast daily cases
with the use of Long Short-Term Memory Network (LSTM) [6]
and Autoregressive Integrated Moving Average (ARIMA) [7]
models. The LSTM and ARIMA approaches were used to make
5-day forecasts for four countries: US, Italy, Spain, and
Germany. Other ARIMA approaches have been used to forecast
cases for specific regions [8-13]. ARIMA has shown to be a
useful tool for forecasting into the near future. However,
ARIMA must be refit for each region. ARIMA based models are
typically used for their ability to learn seasonality trends, which
COVID-19 has not been in circulation long enough to develop.
Most recently, Recurrent Neural Networks (RNN) were
studied, and it was shown that Bi-LSTM [14] can achieve
slightly greater accuracy than LSTM, Gated Recurrent Units
(GRU) [15], support vector regression [16], and ARIMA models
when applied to 10 countries [17]. Though robust in accuracy,
Recurrent Neural Network (RNN) models lack in explaining
how predictions are made, otherwise commonly referred to as
“explainability”. Though neural network methods such as Grad-
CAM [18] help, much work is still needed to improve
explainability. Neural networks also typically require domain
knowledge in machine learning and programming to apply in the
field, thus making them harder to access.
Other models have been developed that use more novel
approaches. Reference [19] trained an ensemble of multiple
machine learning algorithms on time series data to forecast 1, 3,
and 6 days into the future, for ten Brazilian regions. Another
ensemble was used to forecast daily cases in Hungary [20].
Reference [21] used internet searches, news alerts, and
mechanistic models to create forecasts of 32 Chinese provinces.
Reference [22] used mobile phone-based surveys to focus on
specific towns under quarantine. A review of some recent AI
applications can be found in [23]. These more novel approaches
are intriguing and helpful; however, they are hard to deploy due
difficulty in understanding and accessing data – especially to
those untrained in artificial intelligence.
Current models have been applied to specific region(s) and
even though they may give robust results, they are not easily
explainable, nor are they easy to deploy – thus they lack in
generality, explainability, and accessibility. The presented
Elastic Net COVID-19 Forecaster (EN-CoF) aims to fill these
gaps. EN-CoF is intuitive – it simply makes forecasts by taking
a linear combination of a region’s time series data, and the
learned weights follow an intuitive trend. EN-CoF is generic –
it can be applied to any region, because it is trained on
aggregations of time-series data from multiple regions. EN-CoF
is easy to deploy – it requires no programming or AI knowledge,
as the only thing needed to deploy EN-CoF are the learned static
weights and the region’s time series data. EN-CoF is robust –
performing with similar accuracy to more sophisticated models,
such as ARIMA and LSTM. EN-CoF was evaluated against 151
countries, the largest number of countries evaluated to date.
II. METHODS
All models were trained and evaluated using python and the
Scikit-learn [24], statsmodels [25], and Keras [26] libraries. All
code, data, results, and figures can be found on my GitHub: [27].
Data was collected from the European Centre for Disease
Control (ECDC). Day 1 is the first day recorded in that country
© IEEE 2021. This article is free to access and download, along with rights for full text and
data mining, re-use and analysis.