Dynamic integration of classiﬁers for handling concept drift Alexey Tsymbal a,1 , Mykola Pechenizkiy b,d, * , Pa´draig Cunningham c , Seppo Puuronen d a Siemens AG, Gu¨ nther-Scharowsky-Str. 1, 91058 Erlangen, Germany b Information Systems Group, Department of Computer Science, Eindhoven University of Technology, P.O. Box 513, 5600 MB Eindhoven, The Netherlands c School of Computer Science and Informatics, University College, Dublin 4, Ireland d Faculty of Information Technology, University of Jyva¨ skyla¨ , P.O. Box 35, Jyva¨ skyla¨ 40351, Finland Received 14 December 2005; received in revised form 13 October 2006; accepted 9 November 2006 Available online 10 January 2007 Abstract In the real world concepts are often not stable but change with time. A typical example of this in the biomedical context is antibiotic resistance, where pathogen sensitivity may change over time as new pathogen strains develop resistance to antibiotics that were previ- ously eﬀective. This problem, known as concept drift, complicates the task of learning a model from data and requires special approaches, diﬀerent from commonly used techniques that treat arriving instances as equally important contributors to the ﬁnal concept. The underlying data distribution may change as well, making previously built models useless. This is known as virtual concept drift. Both types of concept drifts make regular updates of the model necessary. Among the most popular and eﬀective approaches to handle concept drift is ensemble learning, where a set of models built over diﬀerent time periods is maintained and the best model is selected or the pre- dictions of models are combined, usually according to their expertise level regarding the current concept. In this paper we propose the use of an ensemble integration technique that would help to better handle concept drift at an instance level. In dynamic integration of clas- siﬁers, each base classiﬁer is given a weight proportional to its local accuracy with regard to the instance tested, and the best base clas- siﬁer is selected, or the classiﬁers are integrated using weighted voting. Our experiments with synthetic data sets simulating abrupt and gradual concept drifts and with a real-world antibiotic resistance data set demonstrate that dynamic integration of classiﬁers built over small time intervals or ﬁxed-sized data blocks can be signiﬁcantly better than majority voting and weighted voting, which are currently the most commonly used integration techniques for handling concept drift with ensembles. Ó 2006 Elsevier B.V. All rights reserved. Keywords: Machine learning; Changing environment; Concept drift; Ensemble learning; Dynamic integration of classiﬁers 1. Introduction The problem of concept drift is of increasing importance to machine learning and data mining as more and more data is organized in the form of data streams rather than static databases, and it is rather unusual that concepts and data distributions stay stable over a long period of time [23,30]. Ensemble learning is among the most popular and eﬀec- tive approaches to handle concept drift, in which a set of concept descriptions built over diﬀerent time intervals is maintained, predictions of which are combined using a form of voting, or the most relevant description is selected [12,14,20,21,28]. However, there is a problem with current ensemble approaches; they are not able to deal with local concept drift, which is a common case with real-world data. For example, only particular bacteria may develop their resistance to certain antibiotics, while resistance to the others can remain the same; or the data distribution can change for particular bacteria depending on the season. 1566-2535/$ - see front matter Ó 2006 Elsevier B.V. All rights reserved. doi:10.1016/j.inﬀus.2006.11.002 * Corresponding author. Address: Information Systems Group, Depart- ment of Computer Science, Eindhoven University of Technology, P.O. Box 513, 5600 MB Eindhoven, The Netherlands. E-mail addresses: alexey.tsymbal@siemens.com (A. Tsymbal), mpe- chen@cs.jyu.ﬁ, m.pechenizkiy@tue.nl (M. Pechenizkiy), padraig.cunnin- gham@ucd.ie (P. Cunningham), sepi@cs.jyu.ﬁ (S. Puuronen). 1 Tel.: +49 9131 728796; fax: +49 9131 733190. www.elsevier.com/locate/inﬀus Available online at www.sciencedirect.com Information Fusion 9 (2008) 56–68