Neural network training and rule extraction with augmented discretized input Yoichi Hayashi a , Rudy Setiono b , Arnulfo Azcarraga c a Department of Computer Science, Meiji University Tama-ku, Kawasaki 214-8571, Japan b School of Computing, National University of Singapore 13 Computing Drive, Singapore 117417, Republic of Singapore c College of Computer Studies, De La Salle University 2401 Taft Avenue, Manila, Philippines article info Article history: Received 4 December 2015 Received in revised form 1 March 2016 Accepted 12 May 2016 Communicated by S. Mitra Available online 2 June 2016 Keywords: Supervised classiﬁcation Network pruning Discretization Rule extraction abstract The classiﬁcation and prediction accuracy of neural networks can be improved when they are trained with discretized continuous attributes as additional inputs. Such input augmentation makes it easier for the network weights to form more accurate decision boundaries when the data samples of different classes in the data set are contained in distinct hyper-rectangular subregions in the original input space. In this paper, we present ﬁrst how a neural network can be trained with augmented discretized inputs. The additional inputs are obtained by dividing the original interval of each continuous attribute into subintervals of equal length. The network is then pruned to remove most of the discretized inputs as well as the original continuous attributes as long as the network still achieves a minimum preset accuracy requirement. We then discuss how comprehensible classiﬁcation rules can be extracted from the pruned network by analyzing the activations of the network hidden units and the weights of the network connections that remain in the pruned network. Our experiments on artiﬁcial data sets show that the rules extracted from the neural networks can perfectly replicate the class membership rules used to create the data perfectly. On real-life benchmark data sets, neural networks trained with augmented discretized inputs are shown to achieve better accuracy than neural networks trained with the original data. & 2016 Elsevier B.V. All rights reserved. 1. Introduction Pattern classiﬁcation is one of the most important tasks in data analysis. Among the many machine learning tools that have been proven to be effective for solving pattern classiﬁcation problems are artiﬁcial neural networks. Successful applications of neural networks in a wide array of domains such as engineering, medical, business and social sciences have been widely reported in the literature [1,22,29,34]. Successful applications of neural networks in business include forecasting cash demand in ATMs [58], pre- dicting ﬁnancial distress among listed Chinese companies [19], development of an early warning system to predict currency crisis [51] and development of consumer credit scoring models [59]. Neural networks are also found to be very useful tools for time- series forecasting. They are shown to be more accurate than conventional statistical methods when used to predict new tech- nology product demand [5] and foreign exchange rates [35,43]. Trained neural networks are often pruned to improve their generalization capability [40]. Networks with fewer hidden units and connections normally have lower prediction errors than more complex networks [60,61]. By removing redundant and irrelevant network connections, the networks will be less likely to overﬁt the training data samples. Another beneﬁt of network pruning is the yielding of a less complex network structure which makes the task of extracting comprehensible rules from the network easier. By generating classiﬁcation rules that are comprehensible to the users, the knowledge embedded in the pruned network can be examined, interpreted and veriﬁed by people who are not neces- sarily machine learning experts. With this goal in mind, algorithms that extract rules from neural networks have been proposed [6,8,15,33,44,57]. Many researchers have reported new development and appli- cations of algorithms that extract rules from trained neural networks recently. Novel applications of neural network rule extraction include solving the problem of identifying factors responsible for air pollution levels [13], determining the quality of cotton yarn [2], detecting fault in a transformer [9], recognizing Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/neucom Neurocomputing http://dx.doi.org/10.1016/j.neucom.2016.05.040 0925-2312/& 2016 Elsevier B.V. All rights reserved. E-mail addresses: hayashiy@cs.meiji.jp (Y. Hayashi), rudys@comp.nus.edu.sg (R. Setiono), arnie.azcarraga@delasalle.ph (A. Azcarraga). Neurocomputing 207 (2016) 610–622