To Better Handle Concept Change and Noise: A Cellular Automata Approach to Data Stream Classification Sattar Hashemi 1 , Ying Yang 1 , Majid Pourkashani 2 , Mohammadreza Kangavari 2 1 Clayton School of Information Technology Monash University, Australia {Sattar.Hashemi,Ying.Yang}@infotech.monash.edu.au 2 Computer Engineering Department, Iran University Of Science and Technology, Tehran, Iran {mpkashani, kangavari}@iust.ac.ir Abstract: A key challenge in data stream classification is to detect changes of the concept underlying the data, and accurately and efficiently adapt classifiers to each concept change. Most existing methods for handling concept changes take a windowing approach, where only recent instances are used to update classifiers while old instances are discarded indiscriminately. However this approach can often be undesirably aggressive because many old instances may not be affected by the concept change and hence can contribute to training the classifier, for instance, reducing the classification variance error caused by insufficient training data. Accordingly this paper proposes a cellular automata (CA) approach that feeds classifiers with most relevant instead of most recent instances. The strength of CA is that it breaks a complicated process down into smaller adaptation tasks, for each a single automaton is responsible. Using neighborhood rules embedded in each automaton and emerging time of instances, this approach assigns a relevance weight to each instance. Instances with high enough weights are selected to update classifiers. Theoretical analyses and experimental results suggest that a good choice of local rules for CA can help considerably speed up updating classifiers corresponding to concept changes, increase classifiers’ robustness to noise, and thus offer faster and better classifications for data streams. Keywords: Data Stream Classification, Cellular Automata, Concept Change, Noise Suppression 1. Introduction Nowadays, there are many applications in which data are not static but streaming, such as sensor network data and credit card transactions. In data streams, the concept underlying the data may change over time, which can cause the accuracy of current classifiers to decrease. Meanwhile, real-world data are seldom perfect and often suffer from significant amount of noise, which may affect the accuracy of induced classifiers. Dealing with concept changes and differentiating them from noise has become an interesting and challenging task in the machine learning and data mining community [8,3,24].