UCLA Computer Science Department Technical Report CSD-TR No. 030056 1 Learning Naive Bayes Classifier from Noisy Data Yirong Yang, Yi Xia, Yun Chi, and Richard R. Muntz University of California, Los Angeles, CA 90095, USA {yyr,xiayi,ychi,muntz}@cs.ucla.edu Abstract. Classification is one of the major tasks in knowledge dis- covery and data mining. Naive Bayes classifier, in spite of its simplic- ity, has proven surprisingly effective in many practical applications. In real datasets, noise is inevitable, because of the imprecision of measure- ment or privacy preserving mechanisms. In this paper, we develop a new approach, LinEar-Equation-based noise-aWare bAYes classifier (LEE- WAY ), for learning the underlying naive Bayes classifier from noisy ob- servations. Using linear system of equations and optimization methods, LEEWAY reconstructs the underlying probability distributions of the noise-free dataset based on the given noisy observations. By incorpo- rating the noise model into the learning process, we improve the classi- fication accuracy. Furthermore, as an estimate of the underlying naive Bayes classifier for the noise-free dataset, the reconstructed model can be easily combined with new observations that are corrupted at different noise levels to obtain a good predictive accuracy. Several experiments are presented to evaluate the performance of LEEWAY. The experimen- tal results show that LEEWAY is an effective technique to handle noisy data and it provides higher classification accuracy than other traditional approaches. keywords: naive Bayes classifier, noisy data, classification, Bayesian network. 1 Introduction Classification is one of the major tasks in knowledge discovery and data mining. Naive Bayes classifier, in spite of its simplicity, has proven surprisingly effective in many practical applications, including natural language processing, pattern classification, medical diagnosis and information retrieval [12]. The input dataset for naive Bayes classifier is a set of structured tuples comprised of <feature vec- tor, class value> pairs. The fundamental assumption of naive Bayes classifier is that the feature variables are conditionally independent given the class value. This classifier learns from the training dataset the conditional probability distri- bution of each feature variable X i given the class value c. Given a new instance <x 1 ,x 2 , ..., x n > of the feature vector <X 1 ,X 2 , ..., X n >, the goal of the classi- fication then is to predict its class value c with the highest posterior probability P (C = c|x 1 ,x 2 , ..., x n ). The classification accuracy depends not only on the learning algorithm, but also on the quality of the input dataset. In a real dataset, noise is inevitable,