Intelligent Data Analysis 22 (2018) 787–806 787 DOI 10.3233/IDA-173522 IOS Press An online adaptive classifier ensemble for mining non-stationary data streams Alberto Verdecia-Cabrera a,* , Isvani Frías Blanco b and André C.P.L.F. Carvalho b a Universidad Central de Las Villas, Cuba b Universidade de São Paulo, Brazil Abstract. Many real-world situations constantly generate concept-drifting data streams at high speed. These situations demand adaptive algorithms able to learn online in accordance with the most recent target function (concept). This paper presents Online Adaptive Classifier Ensemble, a new ensemble algorithm able to learn from concept-drifting data streams. The proposed algorithm uses a change detection mechanism in each base classifier in order to handle possible changes in the underlying target function. Each base classifier in the ensemble can alternate between three different stages during the learning process: stable, warning and drift. In a stable stage, the underlying target function is supposed to remain constant, and the corresponding base classifier is updated with each incoming training instance. In a warning stage, a possible change in the target function can be starting to occur, and an alternative base classifier is created and trained together with the other base classifiers. The alternative classifier is added to the ensemble if the drift stage is reached. The new algorithm is compared with various state-of-the-art ensemble algorithms for online learning. Empirical studies show that this proposal is an effective alternative for learning from non-stationary data streams. Keywords: Classifier ensemble, concept drift, data stream, massive data, online learning 1. Introduction Nowadays, very heterogeneous sources generate massive data continuously, without control of the arrival order and at high speed. Internet, cell-phones, cars, and security sensors are examples of such sources [22]. Because of the temporal dimension of the data and the dynamic aspect of many real-world situations, the target function to be learned can change over time. This situation, known as concept drift, complicates the task of estimating the target function, since a previously learned model can become outdated, or even contradictory regarding the most recent data. Learning from data streams is directly related to concept drift, because it is an inherent feature that usually appears when data arrive over time. Examples of real-world situations where these changes can emerge include changes in clothing fashion, news preferences and energy consumption [22,44]. Spam filtering is another common example: spammers try to elude filters by changing the pattern of spam emails, requiring the continuous update of spam filters [25]. An effective method used to learn from non-stationary data streams is to ensemble classifiers. En- sembles combine learning models with the goal of improving the predictive accuracy obtained by single classifiers. To deal with concept drift, ensemble algorithms have adopted two main strategies in the * Corresponding author: Alberto Verdecia Cabrera, Universidad Central de Las Villas, Cuba. E-mail: averdeciac@udg.co.cu. 1088-467X/18/$35.00 c 2018 – IOS Press and the authors. All rights reserved