Artificial Intelligence Based Analytics to Support Environmental Remote Monitoring Masudur R. Siddiquee*, Santosh Joshi † , Himanshu Upadhyay † , Leonel Lagos † *Applied Research Center, Florida International University, 10555 W. Flagler Street, Miami, FL 33174, United States msidd021@fiu.edu, sajoshi@fiu.edu, upadhyay@fiu.edu, lagosl@fiu.edu INTRODUCTION Environmental monitoring is a long-term activity for soil and groundwater contamination check after deactivation and decommissioning (D&D) of a nuclear site [1]. This monitoring is generally based on the collection and analysis of samples of soil and water from a various locations that include groundwater-monitoring wells, surface water in the wetlands, and surface water in the local stream. This type of monitoring via samples collection and analysis is very resource intensive and time consuming task to detect the overall concentration of the contamination dynamics. Thus, there is a recent trend in applying artificial intelligence (AI) along with automated remote sensing using remote systems and robotics [2]. The key step in implementing AI and remote monitoring systems is the training of the AI system for automated pattern identification of the contaminant dynamics in the nuclear site. This step requires a substantial volume of data with consistent sampling in temporal and spatial dimensions. However, missing value is a common issue in the legacy datasets [3]. Another challenge is to collect the sample data on the periodic basis due to the manual process. Although, interpolation could be an easy solution for this problem when the missing values lie in between available time-series data points, the missing values beyond heading and tailing datapoints are challenging to extrapolate. We propose to design a data preprocessing algorithm using Prophet Forecasting Model (PFM) and other data processing steps to address the missing value problem as well as extend the time series with the ability to retain the linear and non-linear trends of the time series. Description of the work The PFM is a time series forecasting procedure based on an additive model. It uses a decomposable time series model with three main model components: trend or nonperiodic, periodic, and sporadic events [4]. The forecasting problem in prophet was framed as a curve-fitting problem and the researcher used generalized additive model (GAM) to solve the problem [5]. The key advantage of GAM is its easy decomposition and ability to accommodate any new component as required. The designed model was implemented in R and Python and distributed with the name Prophet. Due to the incorporation of separate functions for the periodic and nonperiodic components of the time series, the PFM is a great tool for time series forecasting where the seasonality and long-term trend are important features of the dataset. The key benefits of PFM are that it can retain the non-linear and linear trend of the time series in forecasting using generative additive modeling (GAM). Moreover, it handles the outlier by default and smooths out the entire time-series data at the output. The designed algorithm for the legacy dataset preprocessing is tested on publicly available data published by the US Department of Energy (DOE) for Hanford site remediation activity [6]. The dataset was extracted from the published annual report. This representative legacy dataset’s datapoints are very sparse and irregularly distributed over the time axis which is depicted by Fig. 1. To use this dataset for an AI and remote monitoring system design, either it needs to be trimmed to a common time span encompassing a maximum number of time series, sacrificing a lot of data points or select a bigger time span and extrapolate which will not retain the periodicity and the trend of the time series. Fig. 1. Soil and groundwater legacy dataset’s data sparsity issue. Preprocessing time series using PFM to retain the periodicity and the trend in the processed time series has some specific requirements, such as the input dataset need to have specific columns, the dataset should not have any needs to be free from missing values and fitted model needs to be provided to appropriate time points for forecasting. Moreover, it cannot do backward forecasting by default which is necessary to align all the time series in the dataset. A python package was developed to preprocess the dataset