Abstract—In this paper we present and evaluate a novel algorithm for ensemble creation. The main idea of the algorithm is to first independently train a fixed number of neural networks (here ten) and then use genetic programming to combine these networks into an ensemble. The use of genetic programming makes it possible to not only consider ensembles of different sizes, but also to use ensembles as intermediate building blocks. The final result is therefore more correctly described as an ensemble of neural network ensembles. The experiments show that the proposed method, when evaluated on 22 publicly available data sets, obtains very high accuracy, clearly outperforming the other methods evaluated. In this study several micro techniques are used, and we believe that they all contribute to the increased performance. One such micro technique, aimed at reducing overtraining, is the training method, called tombola training, used during genetic evolution. When using tombola training, training data is regularly resampled into new parts, called training groups. Each ensemble is then evaluated on every training group and the actual fitness is determined solely from the result on the hardest part. I. INTRODUCTION When performing predictive classification, the primary goal is to obtain high accuracy; i.e. few misclassifications when the model is applied to novel data. With this in mind, Artificial Neural Networks (ANNs) is often the technique of choice if there is no explicit demand for transparent models. ANNs are known to normally produce very accurate models on most data sets, and has successfully been used in a variety of domains. Within the research community it is, however, also well known that the use of ANN ensembles often results in even higher accuracy; see e.g. [1] and [2]. Despite this, the use of ensembles in applications is still limited. Two possible reasons for this are insufficient knowledge about the benefits of using ensembles and limited support in most data mining tools. In addition, even when ensembles are used, very simple variants are often preferred. A typical choice would be to train a fixed number (like five or ten) ANNs with identical topology, and simply average the output. U. Johansson is with the School of Business and Informatics, University of Borås, SE-501 90 Borås, Sweden. (phone: +46 (0)33 – 4354489; email: ulf.johansson@hb.se). T. Löfström is with the School of Business and Informatics, University of Borås, Sweden. (email: tuve.lofstrom@hb.se). R. König is with the School of Business and Informatics, University of Borås, Sweden. (email: rikard.konig@hb.se). L. Niklasson is with the School of Humanities and Informatics, University of Skövde, Sweden. (email: lars.niklasson@his.se). In this paper we suggest and evaluate a, rather technical, novel algorithm for the construction of ANN ensembles, called GEMS (Genetic Ensemble Member Selection). The algorithm uses genetic programming to actively search among possible ensembles, using a pool of ANNs. Although GEMS has a multitude of parameters, the basic principle is easy to understand. We do not at this stage claim to be anywhere near optimal use of the algorithm, so this study should be seen as a demonstration of its potential. With this in mind the main purpose is to evaluate GEMS on a large number of data sets to establish a lower bound for the level of accuracy to expect. II. BACKGROUND AND RELATED WORK Any algorithm aimed at building ensembles must somehow both train individual models and combine these into the actual ensemble. Standard techniques like bagging, introduced by Breiman [3], and boosting, introduced by Shapire [4], rely on resampling techniques to obtain different training sets for each of the classifiers. Bagging repeatedly samples (with replacement) from a data set according to a uniform probability distribution. Each bootstrap sample has the same size as the original data and is used to train one classifier. After training all classifiers a majority vote is normally used when classifying a novel instance. Boosting is an iterative procedure where the distribution of training examples is adaptively changed to make the classifiers focus on examples hard to classify. Boosting assigns a weight to each training example and this weight is updated depending on whether or not the current classifier classified the example correctly. Naturally, examples incorrectly classified have their weights increased, while those classified correctly have their weights decreased. The final ensemble is obtained by combining the classifiers from each iteration. Boosting algorithms typically differ in the way they update the weights and how predictions from the base classifiers are combined to give the ensemble prediction. Both bagging and boosting can be applied to ANNs, although they are more common when using decision trees; see e.g. [5]. Another option is to train a number of classifiers independently (most often using common data) and then either combine all classifiers or select a subset to form the actual ensemble. Building Neural Network Ensembles using Genetic Programming Ulf Johansson, Tuve Löfström, Rikard König and Lars Niklasson 0-7803-9490-9/06/$20.00/©2006 IEEE 2006 International Joint Conference on Neural Networks Sheraton Vancouver Wall Centre Hotel, Vancouver, BC, Canada July 16-21, 2006 1260