mathematics Article COVID-19 Data Imputation by Multiple Function-on-Function Principal Component Regression Christian Acal , Manuel Escabias , Ana M. Aguilera * ,† and Mariano J. Valderrama   Citation: Acal, C.; Escabias, M.; Aguilera, A.M.; Valderrama, M.J. COVID-19 Data Imputation by Multiple Function-on-Function Principal Component Regression. Mathematics 2021, 9, 1237. https:// doi.org/10.3390/math9111237 Academic Editor: Jin-Ting Zhang Received: 21 April 2021 Accepted: 24 May 2021 Published: 28 May 2021 Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affil- iations. Copyright: © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/). Department of Statistics and O.R. and IMAG, University of Granada, 18071 Granada, Spain; chracal@ugr.es (C.A.); escabias@ugr.es (M.E.); valderra@ugr.es (M.J.V.) * Correspondence: aaguiler@ugr.es † These authors contributed equally to this work. Abstract: The aim of this paper is the imputation of missing data of COVID-19 hospitalized and intensive care curves in several Spanish regions. Taking into account that the curves of cases, deceases and recovered people are completely observed, a function-on-function regression model is proposed to estimate the missing values of the functional responses associated with hospitalized and intensive care curves. The estimation of the functional coefficient model in terms of principal components’ regression with the completely observed data provides a prediction equation for the imputation of the unobserved data for the response. An application with data from the first wave of COVID-19 in Spain is developed after properly homogenizing, registering and smoothing the data in a common interval so that the observed curves become comparable. Finally, Canonical Correlation Analysis is performed on the functional principal components to interpret the relationship between hospital occupancy rate and illness response variables. Keywords: functional data analysis; function-on-function regression; functional principal compo- nents; B-splines; COVID-19 1. Introduction The virus SARS-CoV-2 has been the main global concern ever since its start, at the end of 2019 in China. Its rapid propagation has put all areas of society on alert, not only the field of medicine. Nevertheless, a year and half after the beginning of the pandemic, the virus incidence has not seemed to decrease and the number of deaths continues its upward trend throughout the world. To obtain some idea of extremely negative impact of the pandemic, Coronavirus Disease (COVID-19) has caused a total of 2,780,266 deaths over the planet as of 28 March 2021, according to the real-time database developed by Johns Hopkins University [1]. Another crucial topic derived from the illness is the economic crisis which has devastated all countries. For instance, the unemployment rate is up 5.1% in last three months of 2020 in the UK, according to official data. In order to combat this terrible situation, there is a great need to understand the development of the pandemic. Knowing its behaviour will enable correct decision making to mitigate the spread of the virus and to restore people’s daily lives as soon as possible. To do so, the scientific community is focusing all its efforts on developing new techniques, capable of modelling and predicting the evolution of COVID-19. The main variables of interest that gauge how the epidemiological situation stands in a country are the number of positive, recovered and deceased cases. Another important indicator is the number of people who are hospitalized or in intensive care units. From a mathematical perspective, many authors have already attempted to tackle these variables from different statistical perspectives. A new Bayesian indicator is introduced in [2] to forecast the beginning of a new wave. In [3], semi-empirical models based on the logistic map are considered in order to predict the variables in different phases of the pandemic in Spain. Likewise, Ref. [4] apply SIR models to analyse the trend of the disease over the world and, more specifically, Mathematics 2021, 9, 1237. https://doi.org/10.3390/math9111237 https://www.mdpi.com/journal/mathematics