RaDEN: A Scalable and Efficient Radiation Data Engineering Hadi Fadlallah Lebanese University Beirut, Lebanon Hadi.Fadlullah@gmail.com Yehia Taher -Quentin -en-Yvelines (UVSQ) Versailles, France yehia.taher@uvsq.fr Ali Jaber Lebanese University Beirut, Lebanon ali.jaber@ul.edu.lb Abstract Detecting and monitoring radiation level is one of the critical duties for governments and researchers because of the high threats it oppose to humans. It was challenging in the past century to have a centralized radiation monitoring system until the rise of IoT (Internet of Things). Radiation level is measured using wireless sensors that outputs data which are transferred to a back-end server that monitors radiation and alerts when high radiation levels are detected, the server also stores the data for further analysis. The traditional data warehousing systems cannot handle this type of data any more due to (1) data collection speed, (2) rapid data growth, and (3) data diversity. With the rise of Big Data notion, new technologies are developed to handle data with similar characteristics. In this paper, we proposed RaDEn a scalable and fault-tolerant radiation data engineering system that relies on Big Data technologies such as Hadoop, Kafka, Spark, and Hive. The system is responsible of (1) reading data from sensors and other sources, (2) monitor the radiation level in real-time, (3) storing the data, and (4) providing on-demand data retrieval to users. In addition, we have implemented our system and conducted experiments in a real case scenario in collaboration with the department of environmental radiation control at the Lebanese Atomic Energy Commission (LAEC- CNRS). Keywords Radiation, data engineering, Big Data, radiation monitoring, real-time processing I. INTRODUCTION Radiation pollution is a critical concern due to its detrimental impact on living beings and environment. There are different types of radiation stemming from various radioactive materials and natural resources [1]. The higher level of these radiations specifically the gamma radiation causes severe damage to human health [2]. Therefore, controlling radiation level is critically important. In order to do so, monitoring radiation sources is an indispensable task. The advent IoT (Internet of Things) specifically, sensors have paved the foundation of building smart ecosystems that enable collecting radiation data, processing, and analyzing radiation level in real-time [3]. Radiation sensors collect and transmit data via communication network such as telecommunication network, Wi-Fi, and Internet to the computational engine for measuring radiation levels. Radiation monitoring sensors records data continuously; in consequence, massive volume data can be generated in a high speed. Conventional data engineering technologies such as data warehouse are not adequate to handle this type of data. Several data engineering technologies have been proposed in literature such as [5],[6],[7],[8],[9],[10],[11], [12],[13],[14],[15],[16] and many others. These solutions aims engineering radiation pollution data. However, existing solutions have several limitations that we summarized as follows: (1) Existing technologies rely mainly on traditional data technologies. (2) Most of them are focused on the data collection only. (3) Real-time data collection and processing is outside of the scope of existing technologies. (4) Scalability and fault-tolerance have not been dealt with by the technologies discussed in the previous sections. A solution that can address these limitations is an indispensable need. In this paper, we have proposed a solution called RaDEn, which is a scalable and fault-tolerant system for radiation data engineering that relies mainly on new data technologies that are able to handle massive volume of data generated in high speed. RaDEn has the ability to read data from different sources, monitor radiation level in real-time, storing data in a scalable repository that provides on- demand data retrieval to users for further analysis. The remainder of this paper is organized as follows. In Section 2, we briefly introduce our solution called RaDEn. The development of RaDEn will be detailed in Section 3. Section 4 demonstrates RaDEn. We conclude our work in Section 5. II. AN OVERVIEW OF RADEN RaDEn is a scalable platform developed for radiation data engineering. It allows fetching massive volume of data from different sources. RaDEn enables user collection different types of data including such as structured databases, data streams and flat files. RaDEn has a radiation data lake which stores data a scalable cluster, process then with advanced techniques and visualize data using the best fit methods. RaDEn adopted both realtime and batch style philosophies for collecting and processing data. The hybrid enables users to perform both realtime and batch style operations. The data streaming from sensors can be collected by the users in realtime and files can be ingested in storage 89