2006 Special issue Modular learning models in forecasting natural phenomena D.P. Solomatine * , M.B. Siek Hydroinformatics and Knowledge Management Department, UNESCO-IHE Institute for Water Education, P.O. Box 3015, 2601 DA Delft, The Netherlands Abstract Modular model is a particular type of committee machine and is comprised of a set of specialized (local) models each of which is responsible for a particular region of the input space, and may be trained on a subset of training set. Many algorithms for allocating such regions to local models typically do this in automatic fashion. In forecasting natural processes, however, domain experts want to bring in more knowledge into such allocation, and to have certain control over the choice of models. This paper presents a number of approaches to building modular models based on various types of splits of training set and combining the models’ outputs (hard splits, statistically and deterministically driven soft combinations of models, ‘fuzzy committees’, etc.). An issue of including a domain expert into the modeling process is also discussed, and new algorithms in the class of model trees (piece-wise linear modular regression models) are presented. Comparison of the algorithms based on modular local modeling to the more traditional ‘global’ learning models on a number of benchmark tests and river flow forecasting problems shows their higher accuracy and transparency of the resulting models. q 2006 Elsevier Ltd. All rights reserved. Keywords: Local models; Modular models; Committees; Neural networks; Flood forecasting 1. Introduction Modeling in environmental and earth sciences has two main objectives: to explain a certain process, and to predict some variables characterizing this process. Models capable of predicting natural phenomena are especially valued by decision makers. During the last decade, the so-called data- driven models gained increasing popularity. Typically, they use the methods of computational intelligence (machine learning) to build classification or regression (numerical prediction) models linking input and output variables. Such models are trained on the historical data describing the phenomenon in question. Due to the complex character of the most natural phenomena, they can be seen as composed of a number of processes. In other words, the process can be treated as a multi- stationary one. In principle, a sophisticated data-driven model trained on the whole data set describing the phenomenon can deal with such situations, and there are many examples of this; for example in hydrologic forecasting (see Drucker, 1999; See & Openshaw, 1999). Such model trained on the whole data set will be referred here as a global model. There are, however, issues of accuracy of such models. They may be quite accurate on average, that is, the root mean squared error (RMSE) may be low, but they miss the extreme values (peaks or low values), which is very important to predict in practical situations, e.g. in flood forecasting. So, using a single global model for a complex process is often inadequate. A solution is to use several models, each responsible for a particular sub-process. When a machine learning model is built, the training set can be split into a number of subsets (possibly, statistically sampled), and separate models can be trained on these subsets. It can also be said that the input space is divided into a number of subspaces or regions for each of which a separate specialized model is built. These models are called local, or expert models, or experts (not to confuse with the human domain experts), and the resulting model will be called here a modular model (MM). MM implements just one of the possible ways to combine models. A model consisting of multiple models whose outputs are combined is often called a committee machine (CM) (Haykin, 1999), or, in case of solving a classification problem, a multiple classifier system (Kasabov & Song, 2002, Kuncheva, 2004). There are many variations of such models known–mixtures of experts (Jacobs, Jordan, Nowlan, & Hinton, 1991), stacked regressors (Wolpert, 1992), gated networks, boosting schemes (Freund & Schapire, 1997; Schapire, 1990; Solomatine & Shrestha, 2004), ensembles, etc. A paper by Loyola (Loyola, this volume) in this issue addresses the applications of committees of ANNs to the analysis of satellite data. Neural Networks 19 (2006) 215–224 www.elsevier.com/locate/neunet 0893-6080/$ - see front matter q 2006 Elsevier Ltd. All rights reserved. doi:10.1016/j.neunet.2006.01.008 * Corresponding author. Tel.: C31 15 2151815. E-mail address: d.solomatine@unesco-ihe.org (D.P. Solomatine).