ORIGINAL ARTICLE
Application of the group method of data handling and variable
importance analysis for prediction and modelling of saltwater
intrusion processes in coastal aquifers
Alvin Lal
1,2
·
Bithin Datta
1
Received: 17 September 2019 / Accepted: 24 July 2020
© Springer-Verlag London Ltd., part of Springer Nature 2020
Abstract
Data-driven mathematical models are powerful prediction tools, which are utilized to approximate solution responses
obtained using numerical saltwater intrusion simulation models. Employing data-driven prediction models as a replace-
ment of the complex groundwater flow and transport models enables prediction of future scenarios. Most important, it also
helps save computational time, effort and requirements when developing optimal coastal aquifer management method-
ologies using complex and large-scale coupled simulation–optimization models. In this study, a new data-driven mathe-
matical model, namely group method of data handling (GMDH)-based prediction models, is developed and utilized to
predict salinity concentration in a coastal aquifer by mimicking the responses of a variable-density flow and solute
transport numerical simulation model. For comparison and evaluation purpose, the prediction performances of GMDH
models were compared with well-established support vector machine regression and genetic programming based models.
In addition, one important characteristic of the GMDH models is explored and evaluated, i.e. the ability to identify a set of
most influential input predictor variables (pumping rates) that had the most significant impact on the outcomes (salinity
concentration at monitoring locations). To confirm variable importance, 3 tests are conducted in which new GMDH models
are constructed using subsets of the original datasets. In TEST 1, new GMDH models are constructed using a set of most
influential variables (consisting of pumping rates at selected locations) only. In TEST 2, a subset of 20 variables (10 most
and least influential variables) is used to develop new GMDH models. In TEST 3, a subset of the least influential variables
is used to develop GMDH models. The performance evaluation results demonstrate that GMDH models developed using
the entire dataset had reasonable prediction accuracy and efficiency. The comparison performance evaluation results for the
three test scenarios highlighted the importance of the appropriate selection of relevant input pumping rates when devel-
oping accurate prediction models. The results suggested that incorporating the least influential variables deteriorate the
accuracy of the prediction models; thus, considering the most influential pumping rates it is possible to develop more
accurate and efficient salinity prediction models. Overall, the evaluation results from this study establish that the GMDH
models and the inherent input variable ranking capability can be utilized as accurate and efficient coastal saltwater
intrusion prediction models. Hence, GMDH models are viable saltwater intrusion modelling tools, which can be employed
in future regional-scale saltwater intrusion prediction and management investigations.
Keywords GMDH prediction models · Saltwater intrusion · Coastal aquifers · Variable selection
1 Introduction
Overexploitation of groundwater resources has become a
common issue in coastal regions around the world, many of
which are now experiencing extensive saltwater intrusion
and subsequent groundwater contamination. It is estimated
that 20% of the world’s aquifers are over-exploited,
resulting in serious consequences such as saltwater
& Alvin Lal
alvin.lal@my.jcu.edu.au
1
Discipline of Civil Engineering, College of Science and
Engineering, James Cook University, Townsville, QLD 4811,
Australia
2
School of Pure Sciences, College of Engineering, Science and
Technology, Fiji National University, Suva, Fiji
123
Neural Computing and Applications
https://doi.org/10.1007/s00521-020-05232-8