Using Multiple Windows To Track Concept Drift Mihai M. Lazarescu, Svetha Venkatesh, Hung H. Bui Faculty of Computer Science, Curtin University, GPO Box U1987, Perth 6001, W.A. Email: {lazaresc@cs.curtin.edu.au} January 16, 2003 Abstract In this paper we present a multiple window incremental learning algorithm that distin- guishes between virtual concept drift and real concept drift. The algorithm is unsupervised and uses a novel approach to tracking concept drift that involves the use of competing windows to interpret the data. Unlike previous methods which use a single window to determine the drift in the data, our algorithm uses three windows of different sizes to estimate the change in the data. The advantage of this approach is that it allows the system to progressively adapt and predict the change thus enabling it to deal more effectively with different types of drift. We give a detailed description of the algorithm and present the results obtained from its appli- cation to two real world problems: computing the background image and sound recognition. We also compare its performance with FLORA, an existing concept drift tracking algorithm. 1 Introduction Research in machine learning has been mainly focused on single step non-incremental learning. The general procedure is as follows: all the training examples are presented to the system at the beginning of the learning process and the system develops, from the input data, descriptions for the concepts present in the training set. This type of learning has been proven to produce effective, efficient and good concept descriptions from a given set of examples (ID3 [9], C4.5 [10]). However, this type of system is limited since it does not have the ability to modify concept descriptions that are contradicted by new examples, and such systems must rebuild the concept completely in order to accommodate new facts. Incremental learning does not have this limitation as it allows the concept descriptions to be modified to reflect new learning events. For this reason incremental learning is also more suited to real-world situations. Human learning is incremental. A human being develops concept descriptions based on the facts available at a given time instance and incrementally updates those descriptions as new facts become available. There are essentially two reasons why human beings learn incrementally: the facts are generally received in the form of a sequential flow of information and humans have limited memory and processing power (humans do not store everything they are exposed to, but rather only what they perceive as the most significant facts and generalizations) [11]. One of the problems associated with incremental learning is concept drift. Concept drift repre- sents the change in a concept that is tracked over time. The drift is generally triggered by changes in the concept’s context. In this paper we present a multiple window method that uses an estimate of the rate of change in the target concept to address the problem of concept drift. The major advantage of this ap- proach is that it allows the system to progressively adapt to the change thus enabling it to deal more effectively with different types of drift. The motivation in using multiple windows is to de- termine the change in the data at different levels of resolution. Then, based on the persistence and consistency of the change, one of the levels of resolution is considered to be the “best” interpreta- tion of the data. This multiple-resolution interpretation of the data coupled with the persistence of change allows the algorithm to distinguish between virtual concept drift, noise, real concept drift and more complex forms of drift such as merging concepts or crossing concepts. 1