Research Article
A Comparison Study on Rule Extraction from
Neural Network Ensembles, Boosted Shallow Trees, and SVMs
Guido Bologna
1,2
and Yoichi Hayashi
3
1
Department of Computer Science, University of Applied Sciences and Arts Western Switzerland,
Rue de la Prairie 4, 1202 Geneva, Switzerland
2
Department of Computer Science, University of Geneva, Route de Drize 7, 1227 Carouge, Switzerland
3
Department of Computer Science, Meiji University, Tama-ku, Kawasaki, Kanagawa 214-8571, Japan
Correspondence should be addressed to Guido Bologna; guido.bologna@hesge.ch
Received 27 July 2017; Revised 17 November 2017; Accepted 4 December 2017; Published 9 January 2018
Academic Editor: Erich Peter Klement
Copyright © 2018 Guido Bologna and Yoichi Hayashi. Tis is an open access article distributed under the Creative Commons
Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is
properly cited.
One way to make the knowledge stored in an artifcial neural network more intelligible is to extract symbolic rules. However,
producing rules from Multilayer Perceptrons (MLPs) is an NP-hard problem. Many techniques have been introduced to generate
rules from single neural networks, but very few were proposed for ensembles. Moreover, experiments were rarely assessed by
10-fold cross-validation trials. In this work, based on the Discretized Interpretable Multilayer Perceptron (DIMLP), experiments
were performed on 10 repetitions of stratifed 10-fold cross-validation trials over 25 binary classifcation problems. Te DIMLP
architecture allowed us to produce rules from DIMLP ensembles, boosted shallow trees (BSTs), and Support Vector Machines
(SVM). Te complexity of rulesets was measured with the average number of generated rules and average number of antecedents
per rule. From the 25 used classifcation problems, the most complex rulesets were generated from BSTs trained by “gentle boosting”
and “real boosting.” Moreover, we clearly observed that the less complex the rules were, the better their fdelity was. In fact, rules
generated from decision stumps trained by modest boosting were, for almost all the 25 datasets, the simplest with the highest
fdelity. Finally, in terms of average predictive accuracy and average ruleset complexity, the comparison of some of our results to
those reported in the literature proved to be competitive.
1. Introduction
Te explanation of neural network responses is essential for
their acceptance. As an example, physicians cannot trust any
model without any form of enlightenment. An intuitive way
to give insight into the knowledge embedded within neural
network connections and neuron activation is to extract
symbolic rules. However, producing rules from Multilayer
Perceptrons (MLPs) is an NP-hard problem [1].
In the context of classifcation, the format of a symbolic
rule is given as follows: “if tests on antecedents are true then
class ,” where “tests on antecedents” are in the form
≤
or
≥
, with
as an input variable and
as a real
number. Class designates a class among several possible
classes. Te defnition of the complexity of the extracted rules
is ofen described with two parameters: number of rules and
number of antecedents per rule. Rulesets of low complexity
are preferred compared to those with high complexity, since
at frst sight fewer rules and fewer antecedents are better
understood. Another reason of preference is that rule bases
with lower complexity also reduce the risk of overftting
on new data. Nevertheless, Freitas clarifed that the com-
prehensibility of rules is not necessarily related to a small
number of rules [2]. He proposed a new measure denoted
as prediction-explanation size, which strongly depends on the
average number of antecedents per rule. Another measure
of rule transparency is consistency. Specifcally, an extracted
ruleset is deemed to be consistent if, under diferent training
sessions, the rule extraction algorithm produces rulesets
which classify samples into the same classes. Finally, a rule is
redundant if it conveys the same information or less general
information than the information conveyed by another rule.
Hindawi
Applied Computational Intelligence and So Computing
Volume 2018, Article ID 4084850, 20 pages
https://doi.org/10.1155/2018/4084850