Evaluating influences of seasonal variations and anthropogenic activities on alluvial groundwater hydrochemistry using ensemble learning approaches Kunwar P. Singh a,b, , Shikha Gupta a,b , Dinesh Mohan c a Academy of Scientific and Innovative Research, Anusandhan Bhawan, Rafi Marg, New Delhi 110 001, India b Environmental Chemistry Division, CSIR-Indian Institute of Toxicology Research, Council of Scientific & Industrial Research, Post Box 80, Mahatma Gandhi Marg, Lucknow 226 001, India c School of Environmental Sciences, Jawaharlal Nehru University, New Delhi 110 067, India article info Article history: Received 6 August 2013 Received in revised form 2 January 2014 Accepted 3 January 2014 Available online 27 January 2014 This manuscript was handled by Corrado Corradini, Editor-in-Chief, with the assistance of Barbara Mahler, Associate Editor Keywords: Ensemble learning Decision tree forest Decision treeboost Groundwater hydrochemistry Seasonal variations Anthropogenic activity summary Chemical composition and hydrochemistry of groundwater is influenced by the seasonal variations and anthropogenic activities in a region. Understanding of such influences and responsible factors is vital for the effective management of groundwater. In this study, ensemble learning based classification and regression models are constructed and applied to the groundwater hydrochemistry data of Unnao and Ghaziabad regions of northern India. Accordingly, single decision tree (SDT), decision tree forest (DTF), and decision treeboost (DTB) models were constructed. Predictive and generalization abilities of the pro- posed models were investigated using several statistical parameters and compared with the support vec- tor machines (SVM) method. The DT and SVM models discriminated the groundwater in shallow and deep aquifers, industrial and non-industrial areas, and pre- and post-monsoon seasons rendering misclassification rate (MR) between 1.52–14.92% (SDT); 0.91–6.52% (DTF); 0.61–5.27% (DTB), and 1.52–11.69% (SVM), respectively. The respective regression models yielded a correlation between mea- sured and predicted values of COD and root mean squared error of 0.874, 0.66 (SDT); 0.952, 0.48 (DTF); 0.943, 0.52 (DTB); and 0.785, 0.85 (SVR) in complete data array of Ghaziabad. The DTF and DTB models outperformed the SVM both in classification and regression. It may be noted that incorporation of the bagging and stochastic gradient boosting algorithms in DTF and DTB models, respectively resulted in their enhanced predictive ability. The proposed ensemble models successfully delineated the influ- ences of seasonal variations and anthropogenic activities on groundwater hydrochemistry and can be used as effective tools for forecasting the chemical composition of groundwater for its management. Ó 2014 Elsevier B.V. All rights reserved. 1. Introduction Groundwater contamination is a serious global issue today. Continuously increasing level of contamination with a variety of toxic substances and lowering down of the groundwater table due to over-exploitation to meet globally increasing water demand followed by the declining annual recharge have brought them un- der severe constrains worldwide. Interferences altering the natural water balance have further influenced the redox chemistry of the aquifers resulting in mobilization of several chemical constituents present in the solid matrices (Singh et al., 2007). The chemical composition and hydrochemistry of groundwater in a region are largely determined by the prevalent natural (atmospheric deposi- tions, precipitation, evapo-transpiration, soil/rock-water interac- tions) and anthropogenic activities (Singh et al., 2005). Since, frequency of occurrence and magnitude of the natural processes and anthropogenic activities in a region vary in time and space; their influences are reflected in the groundwater hydrochemistry, exhibiting wide spatial and temporal fluctuations (Singh et al., 2007). Groundwater resources in the alluvial regions are relatively more prone to contamination due to higher population densities and consequently intense agriculture and industrial activities in these areas (EPA, 1993). Knowledge and understanding of the fac- tors responsible for influencing the groundwater composition and hydrochemistry in a region is essentially required to develop http://dx.doi.org/10.1016/j.jhydrol.2014.01.004 0022-1694/Ó 2014 Elsevier B.V. All rights reserved. Corresponding author at: Environmental Chemistry Division, CSIR-Indian Institute of Toxicology Research, Council of Scientific & Industrial Research, Post Box 80, Mahatma Gandhi Marg, Lucknow 226 001, India. Tel.: +91 522 2476091; fax: +91 522 2628227. E-mail addresses: kpsingh_52@yahoo.com, kunwarpsingh@gmail.com (K.P. Singh). Journal of Hydrology 511 (2014) 254–266 Contents lists available at ScienceDirect Journal of Hydrology journal homepage: www.elsevier.com/locate/jhydrol