Fast Feature Selection for Naive Bayes Classification in Data Stream Mining Patricia E.N. Lutu Abstract - Stream mining is the process of mining a continuous, ordered sequence of data items in real-time. Naïve Bayes (NB) classification is one of the popular classification methods for stream mining because it is an incremental classification method whose model can be easily updated as new data arrives. It has been observed in the literature that the performance of the NB classifier improves when irrelevant features are eliminated from the modeling process. This paper reports studies that were conducted to identify efficient computational methods for selecting relevant features for NB classification based on the sliding window method of stream mining. The paper also provides experimental results which demonstrate that continuous feature selection for NB stream mining provides high levels of predictive performance. Index terms - data mining, feature selection, Naïve Bayes classification, stream mining I. INTRODUCTION Predictive data mining involves the creation of classification or regression models. A classification model predicts the value of a categorical dependent variable while a regression model predicts the values a numeric dependent variable. Data stream mining also known as stream mining is the process of mining a continuous, ordered sequence of data items in real-time [1], [2], [3]. Naïve Bayes (NB) classification is one of the popular classification methods for stream mining. The popularity of the NB classifier for stream mining stems from the fact that it is very easy to update the NB model for classification as new stream data arrives. It has been observed in the literature that the performance of the optimal Bayes classifier (from which the NB classifier is derived) is not affected by irrelevant features, that is, features with little or no predictive power. However, it has also been observed that the performance of the NB classifier improves when irrelevant features are eliminated from the modeling process. Since stream mining is done in real time, there is a need to employ fast methods of modeling. This paper reports studies that were conducted to identify efficient computational methods for selecting relevant features for NB classification based on the sliding window method of stream mining. The paper also provides experimental results which demonstrate that continuous feature selection for NB stream mining provides high levels of predictive performance compared to once-off feature selection. The rest of the paper is organised as follows: Section II provides background for stream mining, Naïve Bayes classification and feature selection. Manuscript received 25 March 2013; revised 14 April 2013. P. E. N. Lutu is a Senior Lecturer in the Department of Computer Science, University of Pretoria, Pretoria 0002, Republic of South Africa, phone: +27124204116;fax: +27123625188; web: http://www.cs.up.ac.za/~plutu ; e-mail: Patricia.Lutu@up.ac.za Section III presents the experimental methods. Section IV presents the experimental results. Section V concludes the paper. II. BACKGROUND A. Stream mining Data collected over time is commonly described as a data stream. More precisely, a data stream is a real-time, continuous, ordered sequence of data items [1], [2], [3]. One major challenge for mining data streams is due to the fact that it is infeasible to store the data stream in its entirety. This problem makes it necessary to select and use training data that is not outdated for the mining task. The second challenge for stream mining is due to the phenomenon of concept drift, which is defined as the gradual or rapid changes in the concept that a mining algorithm attempts to model [1], [2], [3]. Given that data items arrive continuously and that the concept being modeled changes gradually or rapidly, there is a need to employ fast methods of modeling for stream mining. Predictive modeling, e.g. predictive classification is commonly applied to stream data. Predictive classification involves the estimation of the conditional probability ) ( x | c Pr j of assigning a class label j c to an instance vector x . This probability is related to the probability ) ( x Pr of encountering an instance with feature vector x . For predictive classification, changes in ) ( x Pr imply that changes have occurred in the probability distribution of the predictive feature values of the concept for which the model is being created. Gao et al. [2], [4] call these changes ‘feature changes’. One approach to selecting data for mining data streams is called the sliding window approach. A sliding window, which may be of fixed or variable width, provides a mechanism of limiting the data to be analysed to the most recent instances. The main advantage of this technique is to prevent stale data from influencing the models obtained in the mining process [5], [6]. The studies reported in this paper are based on the sliding window technique. B. Naïve Bayes classification For predictive classification, the training dataset for a classifier is typically characterised by d predictor variables d i X ,..., X and a class variable C . Predictor variables are also known as the features for the prediction task. The set of n training instances is denoted as )} {( j c , x where ) ( 1 d x ,..., x = x are the values of a training instance and } { 1 J j c ,..., c c ∈ are the class labels. Naïve Bayes classification has been reported in the literature as one of the ‘ideal’ algorithm for stream mining, due to its incremental Proceedings of the World Congress on Engineering 2013 Vol III, WCE 2013, July 3 - 5, 2013, London, U.K. ISBN: 978-988-19252-9-9 ISSN: 2078-0958 (Print); ISSN: 2078-0966 (Online) WCE 2013