Hydrochemical analysis of groundwater using a tree-based model M. Iggy Litaor a, * , H. Brielmann b , O. Reichmann c , M. Shenker c a Tel-Hai College, Dept. of Environmental Sciences, Upper Galilee 12210, Israel b Helmholtz Zentrum Muenchen – German Research Center for Environmental Health, Institute of Groundwater Ecology, Neuherberg, Germany c The Hebrew University of Jerusalem, P.O. Box 12, Rehovot 76100, Israel article info Article history: Received 27 May 2009 Received in revised form 17 March 2010 Accepted 13 April 2010 This manuscript was handled by L. Charlet, Editor in Chief, with the assistance of Prosun Bhattacharya, Associate Editor Keywords: Hydrochemical indices Binary decision tree model Aquifer evaluation summary Hydrochemical indices are commonly used to ascertain aquifer characteristics, salinity problems, anthro- pogenic inputs and resource management, among others. This study was conducted to test the applica- bility of a binary decision tree model to aquifer evaluation using hydrochemical indices as input. The main advantage of the tree-based model compared to other commonly used statistical procedures such as cluster and factor analyses is the ability to classify groundwater samples with assigned probability and the reduction of a large data set into a few signiﬁcant variables without creating new factors. We tested the model using data sets collected from headwater springs of the Jordan River, Israel. The model evaluation consisted of several levels of complexity, from simple separation between the calcium– magnesium–bicarbonate water type of karstic aquifers to the more challenging separation of calcium– sodium–bicarbonate water type ﬂowing through perched and regional basaltic aquifers. In all cases, the model assigned measures for goodness of ﬁt in the form of misclassiﬁcation errors and singled out the most signiﬁcant variable in the analysis. The model proceeded through a sequence of partitions pro- viding insight into different possible pathways and changing lithology. The model results were extremely useful in constraining the interpretation of geological heterogeneity and constructing a conceptual ﬂow model for a given aquifer. The tree model clearly identiﬁed the hydrochemical indices that were excluded from the analysis, thus providing information that can lead to a decrease in the number of routinely ana- lyzed variables and a signiﬁcant reduction in laboratory cost. Ó 2010 Elsevier B.V. All rights reserved. 1. Introduction Hydrochemical indices are commonly used to ascertain the pos- sible chemical reactions of groundwater along a ﬂow path and to identify groundwater evolution and recharge in the aquifer (Zhu et al., 2007; Jianhua et al., 2008). Hydrochemical indices have also been used to classify groundwater according to ﬂow path and res- idence time (Adamski, 2000), evaluate groundwater salinity prob- lems (Alyamani, 1999; Elewa and El Nahry, 2008), assess the impact of microbial activity and chemical fertilizers (Kim et al., 2005), determine seawater intrusion and the impact of anthropo- genic activities (Demirel and Kulege, 2005), and quantify the im- pact of landﬁll on groundwater quality (Singh et al., 2008; Srivastava and Ramanathan, 2008). Wen et al. (2007) used hydro- chemical indices to promote sustainable development and effec- tive management of groundwater resources, while Jalali (2007) used extensive graphical analysis of hydrochemical indices to ascertain the relative importance of anthropogenic addition of Cl and of NO 3 for groundwater quality. Hydrochemical indices are also useful in studying groundwater ﬂow in aquifers with a mini- mal gradient, where accurate and meaningful hydraulic measure- ments are difﬁcult to obtain and the ﬂow direction can only be gleaned from a spatiotemporal analysis of the indices (Litaor et al., 2008; Londoño et al., 2008). Multivariate statistical analysis of these indices is often used to understand and characterize groundwater processes (e.g., Suk and Lee, 1999; Kim et al., 2005; Park et al., 2005; Kumar et al., 2008; Singh et al., 2008; Srivastava and Ramanathan, 2008). The most commonly used multivariate statistical tools in this context are factor and cluster analyses. Factor analysis has been used to resolve issues such as aquifer boundaries (Locsey and Cox, 2003) and groundwater ﬂow paths (Wang et al., 2001), to evaluate groundwa- ter quality (Suk and Lee, 1999; Kim et al., 2005), and to identify anthropogenic impacts versus background (Helena et al., 2000; Pereira et al., 2003). Factor analysis is normally implemented to re- duce a large data set to a small number of factors. Each factor con- sists of several highly intercorrelated measured variables and the factors are then interpreted and linked to a speciﬁc hydrochemical process. However, some data sets require a relatively large number of factors (>5) to account for most of the variance within the data set, which makes the link to a speciﬁc hydrochemical process somewhat subjective. 0022-1694/$ - see front matter Ó 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.jhydrol.2010.04.017 * Corresponding author. Tel.: +972 4 8181725. E-mail address: litaori@telhai.ac.il (M.I. Litaor). Journal of Hydrology 387 (2010) 273–282 Contents lists available at ScienceDirect Journal of Hydrology journal homepage: www.elsevier.com/locate/jhydrol