Assessment of input data selection methods for BOD simulation using data-driven models: a case study Azadeh Ahmadi & Zahra Fatemi & Sara Nazari Received: 18 February 2017 /Accepted: 8 March 2018 # Springer International Publishing AG, part of Springer Nature 2018 Abstract Using the multivariate statistical methods, this study interprets a set of data containing 23 water quality parameters from 10 quality monitoring stations in Karkheh River located in southwest of Iran over 5 years. According to cluster analysis, the stations are classified into three classes of quality, and the most important factors on the whole set of parameters and each class are determined by the help of factor analysis. The results indicate the effects of natural factors, soil weathering and erosion, urban and human wastewater, agricultural and industrial wastewater on water quality at different levels and any location. Afterwards, five input selection methods such as correlation model, principal component analysis, combination of gamma test and backward regression, gamma test and genetic algorithm, and gamma test by elimination method are used for modeling BOD, and then their efficiency is investigated in simulation BOD with local linear regres- sion, Artificial Neural Network, and genetic program- ming. From five methods of input variables in BOD simulation by local linear regression, genetic test and backward regression with RMSE error of 0.27 are the best input methods; gamma test based on genetic algo- rithm is the best model in simulation by Artificial Neural Network with RMSE error of 0.28, and finally, the gamma test model based on genetic algorithm with RMSE error of 0.1303 is the most appropriate model in simulation with genetic programming. Keywords Surface water quality . Factor analysis . Principal component analysis . Gamma test . Genetic programming . Karkheh River Introduction Rivers are extremely vulnerable both directly and indi- rectly as one of the most important water resources for agriculture, industry, and other consumption despite the dynamic nature and easy access to sanitation (Singh et al. 2009). Surface water quality depends on natural, agricultural, and industrial factors and other urban ac- tivities in the basin. The creation of water quality mon- itoring network with great efficiency is among the es- sential affairs in determining the river water quality. These networks are utilized to collect spatial and tem- poral data about water quality parameters reflecting the physical, chemical, and biological features, so that the measured qualitative parameters of these stations largely reflect the total water quality changes) Baghvand et al. 2006). The qualitative parameters can be measured and controlled with more accuracy by increased numbers of monitoring stations and enhanced sampling frequency, but this will lead to increased costs. Therefore, it is Environ Monit Assess (2018) 190:239 https://doi.org/10.1007/s10661-018-6608-4 A. Ahmadi (*) : Z. Fatemi : S. Nazari Department of Civil Engineering, Isfahan University of Technology, Isfahan 8415683111, Iran e-mail: aahmadi@cc.iut.ac.ir Z. Fatemi e-mail: f.z.fatemi@gmail.com S. Nazari e-mail: sara.nazari@cv.iut.ac.ir