Data-Driven Initialization and Structure Learning in Fuzzy Neural Networks M. Setnes" A. Koene R. BabuSka P.M. Bruijn Control Laboratory, Delft University of Technology, P.O. Box 503 1,2600 GA Delft, The Netherlands tel:+31 15 278 3371, fax:+31 15 278 6679, email: M.Setnes@et.tudelft.nl Abstract Initialization and structure leaming in fuzzy neural net- works for data-driven rule-based modeling are discussed. Gradient-based optimization is used to fit the model to data and a number of techniques are developed to enhance trans- parency of the generated rule base: data-driven initial- ization, similarity analysis f o r redundancy reduction, and evaluation of the rules contributions. The initialization uses jexible hyper-boxes to avoid redundunt and irrelev- ant coverage of the input space. Similarity analysis detects redundant terms while the contribution evaluation detects irrelevant rules. Both are applied during network training for early pruning of redundant or irrelevant terms and rules, excluding them from further parameter leaming (training). All steps of the modeling method are presented, and the method is illustrated on an example from the literature. 1 Introduction Much previous work in the field of data-driven machine learning has concentrated on quantitative approximation, while paying little attention to the qualitative properties of the resulting rules. This has resulted in models that are able to reproduce input-output characteristics, but provide little insight into the actual working of themodeled process [ 1,2]. The resulting rule base is often not much different from a black-box neural network model, and the rules are unsuitable for other purposes like, e.g., expert systems. Recently there have been attempts to remedy this situation by reducing the number of generated rules, and maintaining a transparent rule structure. Examples of such approaches are adaptive spline modeling (ASMOD) [3] and fuzzy neural networks (FNN) with structure learning [4]. In ASMOD, reduction of the number of rules is achieved by generating global rules rather than local ones. In the FNN studied in [4], model transparency is sought through redundancy pruning. In [4], similarity analysis is applied to the learnt rules, to detect redundancies. We propose using a similar idea during the learning process. By moving the redundancy detection *The work IS panly supported by the Research Council of Norway. to an earlier stage in the modeling process, less effort is spent on training redundant rules. If redundancy in terms of sim- ilar membership functions (compatible terms) is detected, these are replaced by a single membership function (com- mon, generalized term). This approach has proven useful in complexity reduction of fuzzy rule bases in general [5]. Further, we introduce evaluation of the rules' cumulative relative contribution to detect rules that deal with situations that does not occur in the modeled process. The proposed initialization of of the FNN is a data-driven method that utilizes flexible hyper-boxes to reduce redund- ancy and irrelevant coverage in the initial partitioning of the input space. The initialization can include expert know- ledge in the form of partially or fully known qualitative and or quantitative rules, and a priori information about bounds of (parts of) the input space. Figure l a shows an overview of the process. The modeling method described in this paper has been de- veloped for off-line data-driven modeling of MISO systems. Extension to MIMO systems can be achieved in general by several MISO models in parallel, while on-line application requires some structural changes (see Fig. la). The next section presents the used FNN structure while Section 3 investigates the various stages in the model iden- tification process. In Section 4, the modeling method is applied to a simple example known from the literature. Sec- tion 5 concludes the paper and gives some remarks on further studies. 2 FNN network structure The generated fuzzy rules are of the following structure: Ri : IF 21 is and . . .2d is A!d and . . . ZD is AYD THEN ci, where Ri is the ith rule, xd E xd is the input in dimension d, and ci is the (singleton) consequent value of the ith rule. The antecedents are fuzzy sets defined on the domain of the input Zd and they are indexed jd = 1,2, . . . , Jd, where Jd is the number of antecedents fuzzy sets defined on Xd. Note that the same antecedent function Ajd can be used in several rules. 0-7803-4863-X/98 $10.0001998 IEEE 1147