Ecological Informatics 64 (2021) 101389
Available online 31 July 2021
1574-9541/© 2021 Elsevier B.V. All rights reserved.
Naïve Bayes ensemble models for groundwater potential mapping
Binh Thai Pham
a, b, *
, Abolfazl Jaafari
c, *
, Tran Van Phong
d
, Davood Maf-Gholami
e
,
Mahdis Amiri
f
, Nguyen Van Tao
d
, Van-Hao Duong
g
, Indra Prakash
h
a
University of Transport Technology, 54 Trieu Khuc, Thanh Xuan, Hanoi 100000, Viet Nam
b
Civil and Environmental Engineering Program, Graduate School of Advanced Science and Engineering, Hiroshima University, 1-4-1 Kagamiyama, Higashi-Hiroshima,
Hiroshima 739-8527, Japan
c
Research Institute of Forests and Rangelands, Agricultural Research, Education, and Extension Organization (AREEO), Tehran 1496813111, Iran
d
Institute of Geological Sciences, Vietnam Academy of Science and Technology, 84 Chua Lang, Dong Da, Hanoi, Viet Nam
e
Department of Forest Sciences, Faculty of Natural Resources and Earth Sciences, Shahrekord University, Shahrekord 8818634141, Iran
f
Department of Watershed and Arid Zone Management, Gorgan University of Agricultural Sciences and Natural Resources, Gorgan 4918943464, Iran
g
Hanoi University of Mining and Geology. No 18, Vien street, Bac Tu Liem district, Hanoi, Viet Nam
h
DDG (R) Geological Survey of India, Gandhinagar 382010, India
A R T I C L E INFO
Keywords:
Machine learning
Ensemble modeling
Naïve Bayes
Bagging
AdaBoost
Rotation Forest
ABSTRACT
Groundwater potential maps are important tools for the sustainable management of water resources, especially
in agricultural producing countries like Vietnam. Here, we describe the development and application of a
spatially explicit ensemble modeling framework that allows for analyzing spatially explicit data for estimating
groundwater potential across the Kon Tum Province, Vietnam. Based on this framework, the Naïve Bayes (NB)
method was integrated with the Bagging (B), AdaBoost (AB), and Rotation Forest (RF) ensemble learning
techniques to develop three ensemble models, namely BNB, ABNB, and RFNB. A suite of well yield data and
thirteen explanatory variables (i.e., elevation, aspect, slope, curvature, river density, topographic wetness index,
sediment transport index, soil type, geology, land use, rainfall, and fow direction and accumulation) were
incorporated into the modeling processes over the independent training and validation levels of the single NB
model and its three ensembles. Several performance metrics (i.e., area under the receiver operating characteristic
curve (AUC), root mean square error (RMSE), accuracy, sensitivity, specifcity, negative predictive value, and
positive predictive value) demonstrated that the three ensemble models successfully surpassed the single NB
model in groundwater potential mapping. The ensemble RFNB model with AUC = 0.849, accuracy = 83.33%,
sensitivity = 100%, specifcity = 75%, and RMSE = 0.406 exhibited the most accurate performance for mapping
groundwater potential in the Kon Tum Province, followed by the ABNB (AUC = 0.844), BNB (AUC = 0.815), and
single NB (AUC = 0.786) models, respectively. Further, the correlation based feature selection method identifed
elevation, slope, land use, rainfall, and STI as the most useful explanatory variables for explaining the distri-
bution of groundwater potential in the Kon Tum Province. The methodology proposed in this case study and the
produced potential maps enable managers to align water use patterns with the shared benefts and costs of
different users and to develop strategies for sustainable groundwater exploitation, preservation, and
management.
Abbreviations: Naïve Bayes, (NB); Bagging, (B); AdaBoost, (AB); Rotation forest, (RF); Bagging-Naïve Bayes, (BNB); AdaBoost-Naïve Bayes, (ABNB); Rotation
Forest-Naïve Bayes, (RFNB); area under curve, (AUC); artifcial neural network, (ANN); adaptive neuro fuzzy inference system, (ANFIS); support vector machine,
(SVM); support vector regression, (SVR); boosted regression tree, (BRT); classifcation and regression tree, (CART); multivariate adaptive regression spline, (MARS);
correlation based feature selection method, (CBFS); Vietnam Academy for Water Resources, (VAWR); topographic wetness index, (TWI); sediment transport index,
(STI); digital elevation model, (DEM); Vietnam Meteorological Organization, (VMO); Water Resources Planning and Investigation of Vietnam, (WRPI); average merit,
(AM); principal component analysis, (PCA); positive predictive value, (PPV); negative predictive value, (NPV); receiver operating characteristics, (ROC) curve; root
mean square error, (RMSE); true positive, (TP); false positive, (FP); false negative, (FN); true negative, (TN); Reduced Pruning Error Tree, (RPET); Random Subspace,
(RSS)..
* Corresponding authors.
E-mail addresses: binhpt@utt.edu.vn (B.T. Pham), jaafari@rifr-ac.ir (A. Jaafari).
Contents lists available at ScienceDirect
Ecological Informatics
journal homepage: www.elsevier.com/locate/ecolinf
https://doi.org/10.1016/j.ecoinf.2021.101389
Received 8 May 2021; Received in revised form 26 June 2021; Accepted 26 July 2021