Research Article A Comparison Study on Rule Extraction from Neural Network Ensembles, Boosted Shallow Trees, and SVMs Guido Bologna 1,2 and Yoichi Hayashi 3 1 Department of Computer Science, University of Applied Sciences and Arts Western Switzerland, Rue de la Prairie 4, 1202 Geneva, Switzerland 2 Department of Computer Science, University of Geneva, Route de Drize 7, 1227 Carouge, Switzerland 3 Department of Computer Science, Meiji University, Tama-ku, Kawasaki, Kanagawa 214-8571, Japan Correspondence should be addressed to Guido Bologna; guido.bologna@hesge.ch Received 27 July 2017; Revised 17 November 2017; Accepted 4 December 2017; Published 9 January 2018 Academic Editor: Erich Peter Klement Copyright © 2018 Guido Bologna and Yoichi Hayashi. Tis is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. One way to make the knowledge stored in an artifcial neural network more intelligible is to extract symbolic rules. However, producing rules from Multilayer Perceptrons (MLPs) is an NP-hard problem. Many techniques have been introduced to generate rules from single neural networks, but very few were proposed for ensembles. Moreover, experiments were rarely assessed by 10-fold cross-validation trials. In this work, based on the Discretized Interpretable Multilayer Perceptron (DIMLP), experiments were performed on 10 repetitions of stratifed 10-fold cross-validation trials over 25 binary classifcation problems. Te DIMLP architecture allowed us to produce rules from DIMLP ensembles, boosted shallow trees (BSTs), and Support Vector Machines (SVM). Te complexity of rulesets was measured with the average number of generated rules and average number of antecedents per rule. From the 25 used classifcation problems, the most complex rulesets were generated from BSTs trained by “gentle boosting” and “real boosting.” Moreover, we clearly observed that the less complex the rules were, the better their fdelity was. In fact, rules generated from decision stumps trained by modest boosting were, for almost all the 25 datasets, the simplest with the highest fdelity. Finally, in terms of average predictive accuracy and average ruleset complexity, the comparison of some of our results to those reported in the literature proved to be competitive. 1. Introduction Te explanation of neural network responses is essential for their acceptance. As an example, physicians cannot trust any model without any form of enlightenment. An intuitive way to give insight into the knowledge embedded within neural network connections and neuron activation is to extract symbolic rules. However, producing rules from Multilayer Perceptrons (MLPs) is an NP-hard problem [1]. In the context of classifcation, the format of a symbolic rule is given as follows: “if tests on antecedents are true then class ,” where “tests on antecedents” are in the form   ≤  or   ≥  , with   as an input variable and   as a real number. Class  designates a class among several possible classes. Te defnition of the complexity of the extracted rules is ofen described with two parameters: number of rules and number of antecedents per rule. Rulesets of low complexity are preferred compared to those with high complexity, since at frst sight fewer rules and fewer antecedents are better understood. Another reason of preference is that rule bases with lower complexity also reduce the risk of overftting on new data. Nevertheless, Freitas clarifed that the com- prehensibility of rules is not necessarily related to a small number of rules [2]. He proposed a new measure denoted as prediction-explanation size, which strongly depends on the average number of antecedents per rule. Another measure of rule transparency is consistency. Specifcally, an extracted ruleset is deemed to be consistent if, under diferent training sessions, the rule extraction algorithm produces rulesets which classify samples into the same classes. Finally, a rule is redundant if it conveys the same information or less general information than the information conveyed by another rule. Hindawi Applied Computational Intelligence and So Computing Volume 2018, Article ID 4084850, 20 pages https://doi.org/10.1155/2018/4084850