666 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 19, NO. 4, AUGUST 2011
A Fast and Scalable Multiobjective Genetic Fuzzy
System for Linguistic Fuzzy Modeling in
High-Dimensional Regression Problems
Rafael Alcal´ a, Mar´ıa Jos´ e Gacto, and Francisco Herrera, Member, IEEE
Abstract—Linguistic fuzzy modeling in high-dimensional regres-
sion problems poses the challenge of exponential-rule explosion
when the number of variables and/or instances becomes high. One
way to address this problem is by determining the used variables,
the linguistic partitioning and the rule set together, in order to
only evolve very simple, but still accurate models. However, evolv-
ing these components together is a difficult task, which involves
a complex search space. In this study, we propose an effective
multiobjective evolutionary algorithm that, based on embedded
genetic database (DB) learning (involved variables, granularities,
and slight fuzzy-partition displacements), allows the fast learn-
ing of simple and quite-accurate linguistic models. Some efficient
mechanisms have been designed to ensure a very fast, but not pre-
mature, convergence in problems with a high number of variables.
Further, since additional problems could arise for datasets with a
large number of instances, we also propose a general mechanism
for the estimation of the model error when using evolutionary al-
gorithms, by only considering a reduced subset of the examples.
By doing so, we can also apply a fast postprocessing stage for fur-
ther refining the learned solutions. We tested our approach on 17
real-world datasets with different numbers of variables and in-
stances. Three well-known methods based on embedded genetic
DB learning have been executed as references. We compared the
different approaches by applying nonparametric statistical tests
for multiple comparisons. The results confirm the effectiveness
of the proposed method not only in terms of scalability but in
terms of the simplicity and generalizability of the obtained models
as well.
Index Terms—Embedded genetic database learning, high-
dimensional regression problems, linguistic fuzzy modeling, mul-
tiobjective genetic fuzzy systems, scalability.
I. INTRODUCTION
L
INGUISTIC fuzzy modeling in high-dimensional and
large-scale regression datasets is a challenging topic since
conventional linguistic fuzzy-rule-based systems (FRBSs) suf-
fer from exponential-rule explosion when the number of vari-
Manuscript received February 16, 2010; revised July 23, 2010, November
12, 2010, and February 14, 2011; accepted February 15, 2011. Date of pub-
lication March 24, 2011; date of current version August 8, 2011. This work
was supported by the Spanish Ministry of Education and Science under Grant
TIN2008-06681-C06-01.
R. Alcal´ a and F. Herrera are with the Department of Computer Science and
Artificial Intelligence, University of Granada, 18071 Granada, Spain (e-mail:
alcala@decsai.ugr.es; herrera@decsai.ugr.es).
M. J. Gacto is with the Department of Computer Science, University of Ja´ en,
23071 Ja´ en, Spain (e-mail: mjgacto@ugr.es).
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TFUZZ.2011.2131657
ables and/or data examples becomes high [1], [2]. Another prob-
lem when we deal with high-dimensional datasets is the analysis
of algorithm scalability on big databases (DBs), emphasizing
the training time and the convergence toward compact and in-
terpretable models [3]. This way, we can distinguish two kinds
of problems: high dimensionality when a large number of vari-
ables have to be considered, and scalability in datasets with a
large amount of data.
A good way to address both problems is by searching for a
good and simple global structure within the same process, in
order to consider the relationships among the different compo-
nents defining the knowledge base (KB) of the obtained lin-
guistic models, i.e., by learning the main components of the
KB, a DB containing the definitions of the linguistic fuzzy par-
titions and a rule base (RB) containing the associated set of
rules, together. Since this method involves using different cod-
ing schemes to represent each solution, evolutionary algorithms,
particularly genetic algorithms (GAs), are useful for this task.
These kinds of global-search techniques have been successfully
applied to learn fuzzy systems in recent years, thus giving rise
to the so-called genetic fuzzy systems (GFSs) [3]–[5]. Further-
more, the application of multiobjective evolutionary algorithms
(MOEAs) to the derivation of compact linguistic FRBSs is a
prolific framework in which we can find several interesting and
recent works. Some MOEAs were proposed as postprocessing
techniques [6]–[13], while others were proposed as learning
techniques [11], [14]–[18].
However, this method involves a lot of compo-
nents/parameters that should be determined together: selection
of important variables, determination of a good number of lin-
guistic terms or granularities per variable, parametric definition
of the membership functions (MFs) and associated set of rules.
Since it involves using different coding schemes to represent a
complete solution and, therefore, a very complex search space,
this is a difficult task. In fact, the balance among problem size,
algorithm scalability, and solution quality is an important topic
for GFSs that is worth studying in depth [3], which has not
been directly taken into account in the mentioned evolutionary
approaches devoted to linguistic fuzzy modeling.
An efficient way to obtain the entire KB of an FRBS is to
obtain the DB and the RB within the same process but separately,
as based on embedded genetic DB learning [19]–[24]. This is
an evolutionary process that learns the DB and wraps a simple
method to derive a set of rules for each DB definition. This
enables the most-adequate context [20] for each fuzzy partition
to be learned, which strongly affects the final model complexity.
1063-6706/$26.00 © 2011 IEEE