Neural Computing & Applications manuscript No. (will be inserted by the editor) A Novel Weight Pruning Method for MLP Classifiers Based on the MAXCORE Principle Cláudio M. S. Medeiros · Guilherme A. Barreto Received: date / Accepted: date Abstract We introduce a novel weight pruning methodol- ogy for MLP classifiers, that can be used for model and/or feature selection purposes. The main concept underlying the proposed method is the MAXCORE Principle, which is based on the observation that relevant synaptic weights tend to generate higher correlations between error signals associ- ated with the neurons of a given layer and the error sig- nals propagated back to the previous layer. Nonrelevant (i.e. prunable) weights tend to generate smaller correlations. Us- ing the MAXCORE as a guiding principle, we perform a cross-correlation analysis of the error signals at successive layers. Weights for which the cross-correlations are smaller than a user-defined error tolerance are gradually discarded. Computer simulations using synthetic and real-world data sets show that the proposed method performs consistently better than standard pruning techniques, with much lower computational costs. Keywords MLP classifier · Backpropagation algorithm · Weight pruning · Feature selection. 1 Introduction Even though it has passed two decades and a half since the rediscovery of the back-propagation algorithm in the mid- 1980’s, and despite all the available literature on the MLP Cláudio M. S. Medeiros Federal Institute of Ceará, Department of Industry, Av. Treze de Maio, 2081 - Campus of Benfica, CEP 60040-531, Fortaleza, Ceará, Brazil E-mail: claudiosa@ifce.edu.br Guilherme A. Barreto Federal University of Ceará, Department of Teleinformatics Engineer- ing, Av. Mister Hull, S/N - Campus of Pici, Center of Technology, CP 6005, CEP 60455-970, Fortaleza, Ceará, Brazil E-mail: guilherme@deti.ufc.br network, a beginner soon becomes aware of the difficulties in finding a suitable architecture for real-world applications. In fact, this is a hard task also for an experienced practi- tioner. An architecture that is too small will not be able to learn from data properly, no matter what training algorithm is used for this purpose. An architecture with too many in- put units and hidden layer/neurons are prone to learn unde- sirable characteristics (e.g. noise) of the training data. Hence, a crucial step in the design of a MLP is related with the network model selection problem (Bishop, 1995). This problem is still a research topic of interest (Gómez et al, 2009; Delogu et al, 2008; Trenn, 2008; Seghouane and Amari, 2007; Curry and Morgan, 2006; Nakamura et al, 2006; Xiang et al, 2005), and can be roughly defined as the task of finding the smallest architecture that generalizes well, making good predictions for new data. Generalization can be assessed by changing the number of adjustable pa- rameters (weights and biases) associated with input units and hidden/output neurons. Among the several ways to im- plement this in practice, we list the following four as possi- bly the most common approaches. Exhaustive search plus early stopping: Early stopping of training is a method that aims to prevent overtraining due to oversized network, noisy training examples, or a small train- ing set Cataltepe et al (1999). The performances of several networks having different number of features, hidden layers and hidden neurons are evaluated during training on an inde- pendent validation set. Training of each network is stopped as soon as its generalization error begins to increase. The optimal architecture is the one providing the smallest gener- alization error. Constructive algorithms: network training starts with a small number of hidden neurons and add neurons during the training process, with the goal of arriving at an optimal net- work structure (Aran et al, 2009; Parekh et al, 2000). This is the approach behind the Cascade-Correlation network (Wan