666 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 19, NO. 4, AUGUST 2011 A Fast and Scalable Multiobjective Genetic Fuzzy System for Linguistic Fuzzy Modeling in High-Dimensional Regression Problems Rafael Alcal´ a, Mar´ıa Jos´ e Gacto, and Francisco Herrera, Member, IEEE Abstract—Linguistic fuzzy modeling in high-dimensional regres- sion problems poses the challenge of exponential-rule explosion when the number of variables and/or instances becomes high. One way to address this problem is by determining the used variables, the linguistic partitioning and the rule set together, in order to only evolve very simple, but still accurate models. However, evolv- ing these components together is a difficult task, which involves a complex search space. In this study, we propose an effective multiobjective evolutionary algorithm that, based on embedded genetic database (DB) learning (involved variables, granularities, and slight fuzzy-partition displacements), allows the fast learn- ing of simple and quite-accurate linguistic models. Some efficient mechanisms have been designed to ensure a very fast, but not pre- mature, convergence in problems with a high number of variables. Further, since additional problems could arise for datasets with a large number of instances, we also propose a general mechanism for the estimation of the model error when using evolutionary al- gorithms, by only considering a reduced subset of the examples. By doing so, we can also apply a fast postprocessing stage for fur- ther refining the learned solutions. We tested our approach on 17 real-world datasets with different numbers of variables and in- stances. Three well-known methods based on embedded genetic DB learning have been executed as references. We compared the different approaches by applying nonparametric statistical tests for multiple comparisons. The results confirm the effectiveness of the proposed method not only in terms of scalability but in terms of the simplicity and generalizability of the obtained models as well. Index Terms—Embedded genetic database learning, high- dimensional regression problems, linguistic fuzzy modeling, mul- tiobjective genetic fuzzy systems, scalability. I. INTRODUCTION L INGUISTIC fuzzy modeling in high-dimensional and large-scale regression datasets is a challenging topic since conventional linguistic fuzzy-rule-based systems (FRBSs) suf- fer from exponential-rule explosion when the number of vari- Manuscript received February 16, 2010; revised July 23, 2010, November 12, 2010, and February 14, 2011; accepted February 15, 2011. Date of pub- lication March 24, 2011; date of current version August 8, 2011. This work was supported by the Spanish Ministry of Education and Science under Grant TIN2008-06681-C06-01. R. Alcal´ a and F. Herrera are with the Department of Computer Science and Artificial Intelligence, University of Granada, 18071 Granada, Spain (e-mail: alcala@decsai.ugr.es; herrera@decsai.ugr.es). M. J. Gacto is with the Department of Computer Science, University of Ja´ en, 23071 Ja´ en, Spain (e-mail: mjgacto@ugr.es). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TFUZZ.2011.2131657 ables and/or data examples becomes high [1], [2]. Another prob- lem when we deal with high-dimensional datasets is the analysis of algorithm scalability on big databases (DBs), emphasizing the training time and the convergence toward compact and in- terpretable models [3]. This way, we can distinguish two kinds of problems: high dimensionality when a large number of vari- ables have to be considered, and scalability in datasets with a large amount of data. A good way to address both problems is by searching for a good and simple global structure within the same process, in order to consider the relationships among the different compo- nents defining the knowledge base (KB) of the obtained lin- guistic models, i.e., by learning the main components of the KB, a DB containing the definitions of the linguistic fuzzy par- titions and a rule base (RB) containing the associated set of rules, together. Since this method involves using different cod- ing schemes to represent each solution, evolutionary algorithms, particularly genetic algorithms (GAs), are useful for this task. These kinds of global-search techniques have been successfully applied to learn fuzzy systems in recent years, thus giving rise to the so-called genetic fuzzy systems (GFSs) [3]–[5]. Further- more, the application of multiobjective evolutionary algorithms (MOEAs) to the derivation of compact linguistic FRBSs is a prolific framework in which we can find several interesting and recent works. Some MOEAs were proposed as postprocessing techniques [6]–[13], while others were proposed as learning techniques [11], [14]–[18]. However, this method involves a lot of compo- nents/parameters that should be determined together: selection of important variables, determination of a good number of lin- guistic terms or granularities per variable, parametric definition of the membership functions (MFs) and associated set of rules. Since it involves using different coding schemes to represent a complete solution and, therefore, a very complex search space, this is a difficult task. In fact, the balance among problem size, algorithm scalability, and solution quality is an important topic for GFSs that is worth studying in depth [3], which has not been directly taken into account in the mentioned evolutionary approaches devoted to linguistic fuzzy modeling. An efficient way to obtain the entire KB of an FRBS is to obtain the DB and the RB within the same process but separately, as based on embedded genetic DB learning [19]–[24]. This is an evolutionary process that learns the DB and wraps a simple method to derive a set of rules for each DB definition. This enables the most-adequate context [20] for each fuzzy partition to be learned, which strongly affects the final model complexity. 1063-6706/$26.00 © 2011 IEEE