Comparison of evolutionary computation techniques for noise injected neural network training to estimate longitudinal dispersion coefﬁcients in rivers Adam P. Piotrowski ⇑ , Pawel M. Rowinski, Jaroslaw J. Napiorkowski Institute of Geophysics, Polish Academy of Sciences, Ks. Janusza 64, 01-452 Warsaw, Poland article info Keywords: Differential Evolution Particle Swarm Optimization Evolution Strategy Neural Networks Evolutionary Computation Longitudinal dispersion Noise injection abstract This study presents the comparison of various evolutionary computation (EC) optimization techniques applied to train the noise-injected multi-layer perceptron neural networks used for estimation of longi- tudinal dispersion coefﬁcient in rivers. The special attention is paid to recently developed variants of Differential Evolution (DE) algorithm. The most commonly used gradient-based optimization methods have two signiﬁcant drawbacks: they cannot cope with non-differentiable problems and quickly con- verge to local optima. These problems can be avoided by the application of EC techniques. Although a great amount of various EC algorithms have been proposed in recent years, only some of them have been applied to neural network training – usually with no comparison to other methods. We restrict our comparison to the regression problem with limited data and noise injection technique used to avoid premature convergence and to improve robustness of the model. The optimization methods tested in the present paper are: Distributed DE with Explorative–Exploitative Population Families, Self-Adaptive DE, DE with Global and Local Neighbors, Grouping DE, JADE, Comprehensive Learning Particle Swarm Optimization, Efﬁcient Population Utilization Strategy Particle Swarm Optimization and Covariance Matrix Adaptation – Evolution Strategy. Ó 2011 Elsevier Ltd. All rights reserved. 1. Introduction Artiﬁcial neural networks (ANN) (Haykin, 1999) have been widely used in various scientiﬁc disciplines (Al-Garni, 2010; Pali- war & Kumar, 2009; Wen, Lan, & Shih, 2009). Among a number of neural network types, the multi-layer perceptron neural net- works (MLP) are probably the most popular. Their advantage is both the simplicity and the relatively low number of parameters to be estimated. In the case of real world problems, such as estima- tion of longitudinal dispersion needed in models of transport of pollutants in rivers (Kasheﬁpour, Falconer, & Lin, 2002; Rowinski, Piotrowski, & Napiorkowski, 2005), the measured data are often noisy, contaminated with errors and do not represent properly underlying population. Moreover, frequently the number of avail- able measurements is limited that results in poor generalization properties of a model ﬁtted to the training data (Bishop, 1995a). One of the main issues in application of neural networks is over- ﬁtting to a set of training data. The ﬁrst and obvious but not sufﬁ- cient remedy is reducing the number of parameters, that results in smoothing the output from ANN. Other proposed methods directly applicable to this situation, i.e. leading to good generalization properties of ANN model, are regularization methods, Bayesian approach, early stopping, cross-validation or noise injection (Bishop, 1995a). The last method is used in the present paper. In most applications ANNs are trained with gradient-based algorithms. In such cases algorithms often suffer from trapping in local optima. Although training may be repeated a number of times starting from different initial positions, this rarely leads to ﬁnding the global optimum of a multimodal high-dimensional problem (see empirical example in Martinez-Estudillo, Martinez-Estudillo, Hervias-Martinez, & Garcia-Pedrajas, 2006). The possible way to overcome the optimization problems for neural network training inherent to gradient-based methods is the application of evolutionary computation (EC) algorithms, which become more popular in recent years. Though, the number of papers in which the EC algorithms are compared with gradient methods for ANN training is very limited. Most of them clearly show the advantage of EC techniques (Huang, Chen, Chen, & Chang, 2009; Martinez-Estudillo et al., 2006; Sexton & Gupta, 2000; Zhang, Zhang, Lok, & Lyu, 2007; also Heidrich-Meis- ner & Igel, 2009 for reinforcement learning), others claim that var- ious EC techniques may fail comparison with gradient based approaches (Ilonen, Kamarainen, & Lampinen, 2003; Mandischer, 2002). It suggests that the ANN training performance should differ signiﬁcantly for various EC methods. As a great number of EC algo- rithms have been proposed, it is important to ﬁnd out the most efﬁcient ones for ANN training. The most successful EC methods 0957-4174/$ - see front matter Ó 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2011.08.016 ⇑ Corresponding author. Tel.: +48 22 6915858; fax: +48 22 6915915. E-mail address: adampp@igf.edu.pl (A.P. Piotrowski). Expert Systems with Applications 39 (2012) 1354–1361 Contents lists available at SciVerse ScienceDirect Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa