A Comparison of Fuzzy ARTMAP and Gaussian ARTMAP Neural Networks for Incremental Learning Eric Granger, Jean-Franc ¸ois Connolly and Robert Sabourin Abstract— Automatic pattern classifiers that allow for in- cremental learning can adapt internal class models efficiently in response to new information, without having to retrain from the start using all the cumulative training data. In this paper, the performance of two such classifiers – the fuzzy ARTMAP and Gaussian ARTMAP neural networks – are characterize and compared for supervised incremental learning in environments where class distributions are fixed. Their potential for incremental learning of new blocks of training data, after previously been trained, is assessed in terms of generalization error and resource requirements, for several synthetic pattern recognition problems. The advantages and drawbacks of these architectures are discussed for incremental learning with different data block sizes and data set structures. Overall results indicate that Gaussian ARTMAP is the more suitable for incremental learning as it usually provides an error rate that is comparable to that of batch learning for the data sets, and for a wide range of training block sizes. The better performance is a result of the representation of categories as Gaussian distributions, and of using category- specific learning rate that decreases during the training process. With all the data sets, the error rate obtained by training through incremental learning is usually significantly higher than through batch learning for fuzzy ARTMAP. Training fuzzy ARTMAP and Gaussian ARTMAP through incremental learning often requires fewer training epochs to converge, and leads to more compact networks. I. I NTRODUCTION F OR a wide range of applications, machine learning represents a cost-effective and practical approach to the design of pattern classification systems. However, the performance of pattern classifiers depends heavily on the availability of representative training data, and the acqui- sition (collection and analysis) of such data is expensive and time consuming in many practical applications. Data presented to a pattern classifier, during either the training or operational phases, may therefore be incomplete in one of several ways. In static environments, where class distribu- tions remain fixed, these include a limited number of training observations, missing components of the input observations, missing class labels during training, and missing classes (i.e., some classes that were not present in the training data set may be encountered during operations) [8]. In addition, new information, such as input components and output classes, The authors are with the Laboratoire d’imagerie, de vision et d’intelligence artificielle (LIVIA), ´ Ecole de technologie sup´ erieure, 1100 Notre-Dame Ouest, Montreal, Qc., H3C 1K3, Canada, email: eric.granger@etsmtl.ca, jfconnolly@livia.etsmtl.ca, robert.sabourin@etsmtl.ca. This research was supported in part by the Natural Sciences and Engi- neering Research Council of Canada. Fig. 1. A generic incremental learning scenario where blocks of data are used to update the classifier in an incremental fashion over a period of time. Let D 1 , D 2 , ..., D n+1 be the blocks of training data available to the classifier at discrete instants in time t 1 , t 2 , ..., t n+1 . The classifier starts with initial hypothesis h 0 which constitutes the prior knowledge of the domain. Thus, h 0 gets updated to h 1 on the basis of D 1 , and h 1 gets updated to h 2 on the basis of data D 2 , and so forth [5]. and drifting classes, may suddenly emerge in dynamically- changing environments, where class distributions vary in time. Given a static environment in which training data is in- complete, a critical feature of future automatic classification systems designed according to the machine learning approach is the ability to update their class models incrementally during operational phases, in order to adapt to novelty encountered in the environment [5] [10]. Ideally, as new information becomes available, internal class models should be refined, and new ones should be created on the fly, without having to retrain from the start using all the cumulative training data. For instance, in many practical applications, it is common to acquire additional training data from the environment at some point in time after the classification system has originally been trained and deployed for operations (see Figure 1). Assume that this data is characterized and labeled by a domain expert, and may contain observations belonging to classes that are not present in previous training data, and classes may have a wide range of distributions. It may be too costly or not feasible to accumulate and store all the data used thus far for supervised training, and to retrain a classifier using all the cumulative data 1 . In this case, it may only be feasible to update the system through supervised incremental learning. Assuming that new training data becomes available, incre- mental learning provides the means to efficiently maintain an accurate and up-to-date class models. Another advantage of incremental learning is the low computational complexity required to update a classifier. Indeed, temporary storage of 1 To learn new data, the vast majority of classification algorithms proposed in literature must accumulate and store all training data in memory, and retrain from the start using all previously-accumulated training data. 3305 978-1-4244-1821-3/08/$25.00 c 2008 IEEE Authorized licensed use limited to: ECOLE DE TECHNOLOGIE SUPERIEURE. Downloaded on May 24, 2009 at 21:01 from IEEE Xplore. Restrictions apply.