mathematics Article Machine Learning Techniques Applied to Predict Tropospheric Ozone in a Semi-Arid Climate Region Md Al Masum Bhuiyan 1, * , Ramanjit K. Sahi 1 , Md Romyull Islam 1 and Suhail Mahmud 2   Citation: Bhuiyan, M.A.M.; Sahi, R.K.; Islam, M.R.; Mahmud, S. Machine Learning Techniques Applied to Predict Tropospheric Ozone in a Semi-Arid Climate Region. Mathematics 2021, 9, 2901. https:// doi.org/10.3390/math9222901 Academic Editors: Monica Bianchini and Maria Lucia Sampoli Received: 11 October 2021 Accepted: 11 November 2021 Published: 15 November 2021 Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affil- iations. Copyright: © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/). 1 Department of Mathematics & Statistics, Austin Peay State University, Clarksville, TN 37044, USA; sahir@apsu.edu (R.K.S.); mislam@my.apsu.edu (M.R.I.) 2 Earth & Environmental Systems Institute (EESI), The Pennsylvania State University, State College, PA 16802, USA; sfm6095@psu.edu * Correspondence: bhuiyanm@apsu.edu; Tel.: +1-931-221-7964 Abstract: In the last decade, ground-level ozone exposure has led to a significant increase in environ- mental and health risks. Thus, it is essential to measure and monitor atmospheric ozone concentration levels. Specifically, recent improvements in machine learning (ML) processes, based on statistical modeling, have provided a better approach to solving these risks. In this study, we compare Naive Bayes, K-Nearest Neighbors, Decision Tree, Stochastic Gradient Descent, and Extreme Gradient Boost- ing (XGBoost) algorithms and their ensemble technique to classify ground-level ozone concentration in the El Paso-Juarez area. As El Paso-Juarez is a non-attainment city, the concentrations of several air pollutants and meteorological parameters were analyzed. We found that the ensemble (soft voting classifier) of algorithms used in this paper provide high classification accuracy (94.55%) for the ozone dataset. Furthermore, variables that are highly responsible for the high ozone concentration such as Nitrogen Oxide (NOx), Wind Speed and Gust, and Solar radiation have been discovered. Keywords: tropospheric ozone; machine learning; El Paso-Juarez; semi-arid climate 1. Introduction Environmental problems, especially air pollution, are gaining attention as it is one of the most crucial health hazards to humans. It is an invisible killer that takes numerous human lives every year. Thus, it is essential to predict whether a day will be polluted or not. Presently, there are various pollutants in the atmosphere. Ground-level ozone especially affects human health and some delicate plants and vegetation adversely. It has been noted that high concentrations of ground-level ozone are of significant concern for many metropolitan cities in US and Mexico. In our paper, we are focusing on the border cities of El Paso in Texas and Juarez in Mexico. The climate of this region is arid and has characteristics of the urban southwestern US climate [1]. The region’s air quality problem is partially the result of industrial activities and high automobile emissions in the region. Moreover, the geopolitical region of El Paso-Juarez is characterized by exceptional meteorological conditions, such as higher planetary boundary layer heights (PBLHs) than any other surrounding city, due to its complex topography. El Paso, being a semi-arid climate region, experiences high ozone episodes in the summer season. Days with an 8-h ozone concentration of more than 70 parts per billion volume (ppbv) are defined as the High Ozone episodes [2,3]. The following Figure 1 is a representation of the annual high ozone events recorded by the Texas Commission on Environmental Quality (TCEQ) ground stations known as Continuous Ambient Monitoring Stations (CAMS) from 2000 to 2019. In this region, the highest ozone levels are commonly recorded during the summer months of June to August (Figure 2). High Ozone caused by several reasons such as high degree of temperature (June and July are the peak summer months with an average temperature of 40 degree Celsius) with calms winds (mean value Mathematics 2021, 9, 2901. https://doi.org/10.3390/math9222901 https://www.mdpi.com/journal/mathematics