mathematics
Article
Machine Learning Techniques Applied to Predict Tropospheric
Ozone in a Semi-Arid Climate Region
Md Al Masum Bhuiyan
1,
* , Ramanjit K. Sahi
1
, Md Romyull Islam
1
and Suhail Mahmud
2
Citation: Bhuiyan, M.A.M.; Sahi,
R.K.; Islam, M.R.; Mahmud, S.
Machine Learning Techniques
Applied to Predict Tropospheric
Ozone in a Semi-Arid Climate Region.
Mathematics 2021, 9, 2901. https://
doi.org/10.3390/math9222901
Academic Editors: Monica Bianchini
and Maria Lucia Sampoli
Received: 11 October 2021
Accepted: 11 November 2021
Published: 15 November 2021
Publisher’s Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2021 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
1
Department of Mathematics & Statistics, Austin Peay State University, Clarksville, TN 37044, USA;
sahir@apsu.edu (R.K.S.); mislam@my.apsu.edu (M.R.I.)
2
Earth & Environmental Systems Institute (EESI), The Pennsylvania State University,
State College, PA 16802, USA; sfm6095@psu.edu
* Correspondence: bhuiyanm@apsu.edu; Tel.: +1-931-221-7964
Abstract: In the last decade, ground-level ozone exposure has led to a significant increase in environ-
mental and health risks. Thus, it is essential to measure and monitor atmospheric ozone concentration
levels. Specifically, recent improvements in machine learning (ML) processes, based on statistical
modeling, have provided a better approach to solving these risks. In this study, we compare Naive
Bayes, K-Nearest Neighbors, Decision Tree, Stochastic Gradient Descent, and Extreme Gradient Boost-
ing (XGBoost) algorithms and their ensemble technique to classify ground-level ozone concentration
in the El Paso-Juarez area. As El Paso-Juarez is a non-attainment city, the concentrations of several air
pollutants and meteorological parameters were analyzed. We found that the ensemble (soft voting
classifier) of algorithms used in this paper provide high classification accuracy (94.55%) for the ozone
dataset. Furthermore, variables that are highly responsible for the high ozone concentration such as
Nitrogen Oxide (NOx), Wind Speed and Gust, and Solar radiation have been discovered.
Keywords: tropospheric ozone; machine learning; El Paso-Juarez; semi-arid climate
1. Introduction
Environmental problems, especially air pollution, are gaining attention as it is one of
the most crucial health hazards to humans. It is an invisible killer that takes numerous
human lives every year. Thus, it is essential to predict whether a day will be polluted
or not. Presently, there are various pollutants in the atmosphere. Ground-level ozone
especially affects human health and some delicate plants and vegetation adversely. It
has been noted that high concentrations of ground-level ozone are of significant concern
for many metropolitan cities in US and Mexico. In our paper, we are focusing on the
border cities of El Paso in Texas and Juarez in Mexico. The climate of this region is arid
and has characteristics of the urban southwestern US climate [1]. The region’s air quality
problem is partially the result of industrial activities and high automobile emissions in the
region. Moreover, the geopolitical region of El Paso-Juarez is characterized by exceptional
meteorological conditions, such as higher planetary boundary layer heights (PBLHs) than
any other surrounding city, due to its complex topography.
El Paso, being a semi-arid climate region, experiences high ozone episodes in the
summer season. Days with an 8-h ozone concentration of more than 70 parts per billion
volume (ppbv) are defined as the High Ozone episodes [2,3]. The following Figure 1 is
a representation of the annual high ozone events recorded by the Texas Commission on
Environmental Quality (TCEQ) ground stations known as Continuous Ambient Monitoring
Stations (CAMS) from 2000 to 2019. In this region, the highest ozone levels are commonly
recorded during the summer months of June to August (Figure 2). High Ozone caused by
several reasons such as high degree of temperature (June and July are the peak summer
months with an average temperature of 40 degree Celsius) with calms winds (mean value
Mathematics 2021, 9, 2901. https://doi.org/10.3390/math9222901 https://www.mdpi.com/journal/mathematics