XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE Exploring relationship between COVID-19 cases and eating habits using data of London boroughs Abdulhadi Algbear College of Computer Science and Engineering University of Jeddah Jeddah, Saudi Arabia Abdulhadi.IT@hotmail.com Mohammed Ali Alqarni College of Computer Science and Engineering University of Jeddah Jeddah, Saudi Arabia alqarni@uj.edu.sa Muhammad Murtaza Khan College of Computer Science and Engineering University of Jeddah Jeddah, Saudi Arabia mkhan@uj.edu.sa Muhammad Usman Ilyas College of Computer Science and Engineering University of Jeddah Jeddah, Saudi Arabia milyas@uj.edu.sa Abstract—COVID-19 has affected everyone in the world in one way or another. At the time of this writing, there are approximately 110.9 million reported cases with approximately 2.4 million deaths across the world this makes the ratio of deaths to total infections a little over 2%. To better understand the reasons for COVID-19 related infections and deaths, efforts are underway to uncover relationships between them and existing health conditions. Some studies have focused on causes of infection and use of preventive equipment for protection, while others have focused on identifying relationships between deaths and existing diabetes, heart condition or hyper-tension. Research has established that pre-existing health conditions can be associated to eating habits of people. Therefore, we have tried to determine if there is any relationship between eating habits of people and COVID-19 infections. This has been done by making use of data related to purchases made by residents from Tesco supermarket, for London Boroughs. The data related to pre-existing health conditions, for same regions, was obtained from the London Datastore. Our study indicates that for the London Boroughs’ data, food products containing alcohol, carbohydrates and fats are weakly correlated with the number of COVID-19 cases. We believe that these results warrant a more detailed investigation of causality. Keywords—COVID-19, correlation, mutual information, regression, food groups I. INTRODUCTION Modern data aggregation methods have made large, diverse data sets available that can be used to determine and establish relationships between different facets of life. Thus, data about jobs, economy, housing, health, environment, purchases is available and can be used to determine direct relationships at scales at which it was not previously possible. Availability of large amounts of data has ushered in a new era in data analytics. Thus, when Coronavirus disease 2019 (COVID-19), also known as Severe acute respiratory syndrome Coronavirus 2 (SARS-COV-2), began spreading, data collection along with its availability and analysis became important, albeit less than finding a treatment or developing a vaccine, but still important for tracking the spread of the disease and identifying super spreaders [1][2]. Considering the fact that approximately 110.9 million people have been infected with the virus [3], monitoring the spread of COVID-19 is still an important area of research. A secondary area of focus for researchers has been to understand if there is any relationship between pre-existing health conditions and COVID-19 infections or deaths. The myths and conspiracies surrounding infection of COVID-19 due to stress were addressed by Georgiou et al. in [5]. The authors clarified the myth that people with stress are not more likely to be affected by COVID-19 compared to others. In [6], Jordan et al. observed that different studies based on data collected from Wuhan, Italy and UK citing increased risk of COVID-19 related deaths for people suffering from pre-existing health conditions. However, they highlighted that these studies comprised of a small population ranging from 100 to 40,000 participants with data that is not readily available and, in some cases, incomplete. Therefore, there is a need for improved data acquisition for analysis and, hence, reaching better conclusions. It was highlighted in [7], based on a study by Chinese Center for Disease Control (CDC) of approximately 44,000 lab-tested positive cases, that advanced age, heart conditions, cancer, hyper-tension, chronic respiratory diseases, diabetes increase the risk of fatality in case of a Coronavirus infection. Data collected from patients in China suggested that smoking and obesity were linked with higher risk of severe infection and death [8]. In another study, Stefan et al. [9] identified that patients with obesity are at increased risk for severe COVID-19 symptoms. In this work we try to identify if eating habits have a direct relationship with the number of COVID-19 cases in a particular region. This is based on the assumption that eating habits generally effect the health of an individual, since pre- existing conditions seem to have a relationship with COVID- 19. Therefore, it will be interesting to see if any food group has any relationship with the number of COVID-19 cases in a geographic region. To conduct this analysis, data for COVID-19 cases, along with the data of pre-existing health conditions and data related to eating habits of people is required for a particular region. All of this data was not available at the same spatial resolution and for the same temporal window. However, we were able to compile data from different sources to obtain data at the resolution of Boroughs for London region. The rest of the paper is organized as follows. Section II introduces the sources and type of data used in this study. Section III presents a correlation-based analysis between 2021 National Computing Colleges Conference (NCCC) | 978-1-7281-6719-0/20/$31.00 ©2021 IEEE | DOI: 10.1109/NCCC49330.2021.9428879