Optimal Bayesian Classification in Nonstationary Streaming Environments Jehandad Khan, Nidhal Bouaynaya, Robi Polikar Abstract—A novel method of classifying data drawn from a nonstationary distribution with drifting mean and variance is presented. The novelty of the approach is based on splitting the problem of tracking a nonstationary distribution into separate classification and time series state estimation problems. State space models for drift in both the mean and variance are presented, which are then successfully tracked using a Kalman filter and a particle filter for the linear and non-linear parts respectively. Preliminary results, which show the promising po- tential of the approach, are also presented, along with concluding remarks for potential uses of the proposed approach. I. I NTRODUCTION Most classification algorithms rely on the underlying as- sumption that the distribution generating the data is stationary. However, this is a very restricting assumption, since many real- world problems generate data whose underlying distributions change over time. Real world applications that generate such nonstationary data include climate change, remote-sensing applications, metagenomic applications (genomic analysis of environmental samples, where species abundance change dra- matically along unknown environmental gradients), analysis of web-user interest, identification of financial fraud from trans- action data, prediction of energy demand and pricing, among many others. Also relevant are installations with limited access (e.g., oil pipelines, building foundations, extreme geographic locations, etc.), where subsequent data later collected from embedded sensors can be subject to a variety of nonstationary changes; e.g., cracks from freeze-thaw cycles, shifting tec- tonic plates, etc. The stationarity assumption is often used to simplify the mathematical setting of the problem, and thus also simplify the derived solutions. However, this simplifying assumption forces the problem into a subspace of the original problem, often resulting in suboptimal solutions. Taking the nonstationary nature of the problem into consideration would allow us to take advantage of the full richness of the data, resulting in more accurate classification and prediction in tracking nonstationary environments. The nonstationarity, also known as concept drift, can be treated using a variety of approaches such as domain adapta- tion [1] [2], covariate shift [3] or more generally as sample selection bias [4], or with specific ensemble based approaches J. Khan, N. Bouaynaya and R. Polikar are with the Dept. of Electrical & Computer Engineering at Rowan University. (email: khanj6@students.rowan.edu, {bouaynaya,polikar}@rowan.edu). This material is based upon work supported by the National Science Foundation grants ECCS-1310496, CRI CNS-0855248, EPS-0701890, EPS- 0918970, MRI CNS-0619069, and OISE-0729792. This project is also sup- ported by Award Number R01GM096191 from the National Institute Of General Medical Sciences (NIH/NIGMS). such as Learn ++ .NSE [5], DWM [6] and SEA [7]. These techniques acknowledge that the probability distribution which generated the data at any point in time is different from the probability distribution on which the classifier will make its prediction i.e., p s (x, y) = p t (x, y) where p s and p t are the source and target distributions, respectively, for the features x and labels y. These approaches rely on different assumptions about the source and target distributions: for example, in covariate shift it is assumed that the support of p s (x, y) contains the support of p t (x, y) [8], thus the source and target distributions may be different but still are related. Moreover, it is also assumed that there is sufficient amount of labeled and unlabeled data available in the source and target domain, respectively. The aforementioned algorithms also require a large amount of labeled data (at least from the source domain), rendering the availability or the high cost associated with obtaining labeled data a potential obstacle in using these approaches. In medical diagnostics, for example, it is highly desirable that the learning algorithm is trained using a minimum number of subjects, typically due to the scarcity of consenting subjects, the mon- etary cost associated with running diagnostic tests, or even the rarity of the disease. Semi Supervised Learning (SSL) has been used for such scenarios of limited availability of labeled training data, wherein the class information is propagated from small number of labeled data to more abundant unlabeled data instances [9] using such approaches as density separation, decision boundary detection or by constructing a graph. The primary focus of SSL techniques has been in stationary data environments, but there has been some recent advances that deal with data generated from nonstationary data distributions. These methods still have the canonical SSL implementation at their core with an exterior modification that caters for the drifting probability distributions. However, most such ap- proaches still require that labeled data be available at each time point [10]. Active Learning (AL) is another approach used to tackle the limited data availability by selecting instances from the data that provide maximum information about the class boundaries, and then requesting the corresponding labels. AL algorithms rely on the imminent availability of the labels for any requested instances, an unrealistic expectation in certain applications. Our work focuses on the nonstationarity of the source and target distributions generating the data, as well as the scarcity of labeled data instances. As stated above, these two problems are typically dealt with separately; but it is not unusual that both scenarios manifest themselves at the same time, hence 2014 International Joint Conference on Neural Networks (IJCNN) July 6-11, 2014, Beijing, China 978-1-4799-1484-5/14/$31.00 ©2014 IEEE 609