mathematics
Article
COVID-19 Data Imputation by Multiple Function-on-Function
Principal Component Regression
Christian Acal
†
, Manuel Escabias
†
, Ana M. Aguilera *
,†
and Mariano J. Valderrama
†
Citation: Acal, C.; Escabias, M.;
Aguilera, A.M.; Valderrama, M.J.
COVID-19 Data Imputation by
Multiple Function-on-Function
Principal Component Regression.
Mathematics 2021, 9, 1237. https://
doi.org/10.3390/math9111237
Academic Editor: Jin-Ting Zhang
Received: 21 April 2021
Accepted: 24 May 2021
Published: 28 May 2021
Publisher’s Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2021 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
Department of Statistics and O.R. and IMAG, University of Granada, 18071 Granada, Spain;
chracal@ugr.es (C.A.); escabias@ugr.es (M.E.); valderra@ugr.es (M.J.V.)
* Correspondence: aaguiler@ugr.es
† These authors contributed equally to this work.
Abstract: The aim of this paper is the imputation of missing data of COVID-19 hospitalized and
intensive care curves in several Spanish regions. Taking into account that the curves of cases, deceases
and recovered people are completely observed, a function-on-function regression model is proposed
to estimate the missing values of the functional responses associated with hospitalized and intensive
care curves. The estimation of the functional coefficient model in terms of principal components’
regression with the completely observed data provides a prediction equation for the imputation of
the unobserved data for the response. An application with data from the first wave of COVID-19 in
Spain is developed after properly homogenizing, registering and smoothing the data in a common
interval so that the observed curves become comparable. Finally, Canonical Correlation Analysis
is performed on the functional principal components to interpret the relationship between hospital
occupancy rate and illness response variables.
Keywords: functional data analysis; function-on-function regression; functional principal compo-
nents; B-splines; COVID-19
1. Introduction
The virus SARS-CoV-2 has been the main global concern ever since its start, at the
end of 2019 in China. Its rapid propagation has put all areas of society on alert, not only
the field of medicine. Nevertheless, a year and half after the beginning of the pandemic,
the virus incidence has not seemed to decrease and the number of deaths continues its
upward trend throughout the world. To obtain some idea of extremely negative impact of
the pandemic, Coronavirus Disease (COVID-19) has caused a total of 2,780,266 deaths over
the planet as of 28 March 2021, according to the real-time database developed by Johns
Hopkins University [1]. Another crucial topic derived from the illness is the economic
crisis which has devastated all countries. For instance, the unemployment rate is up 5.1%
in last three months of 2020 in the UK, according to official data.
In order to combat this terrible situation, there is a great need to understand the
development of the pandemic. Knowing its behaviour will enable correct decision making
to mitigate the spread of the virus and to restore people’s daily lives as soon as possible. To
do so, the scientific community is focusing all its efforts on developing new techniques,
capable of modelling and predicting the evolution of COVID-19. The main variables of
interest that gauge how the epidemiological situation stands in a country are the number
of positive, recovered and deceased cases. Another important indicator is the number of
people who are hospitalized or in intensive care units. From a mathematical perspective,
many authors have already attempted to tackle these variables from different statistical
perspectives. A new Bayesian indicator is introduced in [2] to forecast the beginning of a
new wave. In [3], semi-empirical models based on the logistic map are considered in order
to predict the variables in different phases of the pandemic in Spain. Likewise, Ref. [4]
apply SIR models to analyse the trend of the disease over the world and, more specifically,
Mathematics 2021, 9, 1237. https://doi.org/10.3390/math9111237 https://www.mdpi.com/journal/mathematics