PCA-NB Algorithm to Enhance the Predictive Accuracy T.Karthikeyan 1 , P.Thangaraju 2 1 Associate Professor, Dept. of Computer Science, P.S.G Arts and Science College, Coimbatore, India 2 Research Scholar, Bharathiar University, Asst. Professor, Dept. of Comp. Applications, Bishop Heber College, Tiruchirappalli, India 1 t.karthikeyan.gasc@gmail.com 2 thangarajubhc@yahoo.co.in Abstract- This paper mainly deals with feature extraction algorithm used to improve the predicted accuracy of the classification. This paper applies with Principal Component analysis as a feature evaluator and ranker for searching method. Naive Bayes algorithm is used as a classification algorithm. It analyzes the hepatitis patients from the UC Irvine machine learning repository. The results of the classification model are accuracy and time. Finally, it concludes that the proposed PCA-NB algorithm performance is better than other classification techniques for hepatitis patients. Keyword- Feature Extraction, Classification, Principal Component Analysis, Naive Bayes I. INTRODUCTION In data mining and in image processing, feature extraction is a special form of dimensionality reduction. When the input data to an algorithm is too large to be processed and it is suspected to be notoriously redundant then the input data will be transformed into a reduced representation set of features. Transforming the input data into the set of features is called feature extraction [1]. If the features extracted are carefully chosen it is expected that the features set will extract the relevant information from the input data in order to perform the desired task using this reduced representation instead of the full size input. Feature extraction involves simplifying the amount of resources required to describe a large set of data accurately. Feature extraction can be applied in many data mining applications to improve the predictive accuracy. The objective of this study is to predict the life expectancy for patients with hepatitis based on a hepatitis data and improve the classification accuracy. We are going to use Naive Bayes algorithm to get the accuracy of the classification and prediction. In order to increase its accuracy Principal Component Analysis [2] of feature reduction is being used. This is to make sure the noisy or irrelevance feature should be taken care of. Then compare the accuracy of prediction by using Naive bayes and other classification algorithms like J48, Multi layer Perceptron(MLP), Radial Basis Function(RBF). This paper is organized as follows. The section 2 deals with related work. Section 3 deals with the concept of feature extraction and principal component analysis. Section 4 elaborates with the naive bayes classification algorithm. Section 5 discusses with the data set descriptions. Section 6 deal with the proposed methodology and section 7 illustrates the performance evaluation. II. RELATED WORK Many feature extraction methods are used to deal with the diagnosis of medical diagnosis problem, and most of them have achieved better classification accuracies. Kemal Polat et.al. Used an artificial immune recognition system and principal component analysis (PCA) via 10-fold cross-validation was used for classification [3]. Tahseen a jilani et.al. used PCA-ANN based classification algorithm for hepatitis disease diagnosis [4]. Heng lian used principal component analysis and one class support vector machines for image retrieval [5]. Hlip-ling chen et at. Used hybrid prediction model which integrates a local discriminant analysis and support vector machines for hepatitis disease diagnosis [6]. Yilmaz Kaya and Murat Uyar developed a hybrid decision support system based on rough set and extreme learning machine for diagnosis of hepatitis disease [7]. Huda yasin et al. Uses PCA as a feature extraction tool and regression analysis and achieves 89% classification accuracy [8]. Javad salami sartakhti et. al. applied support vector machine and simulated annealing for hepatitis disease diagnosis [9]. In order to improve the classification accuracy, PCA feature extraction method is used. The objective of the proposed method is to explore the performance of hepatitis diagnosis using an algorithm that integrates PCA with Naive Bayes. The proposed method (PCA-NB) is firstly to use PCA in reducing the dimension of the hepatitis dataset, and then the obtained reduced feature subset is served as the input into the designed NB classifier. The effectiveness of PCA-NB is examined in terms of classification accuracies, sensitivity and specificity, precision. Further, the superior classification capability of the proposed method can be observed by comparing the results with those using MLP based on PCA (PCA-MLP), RBF based on PCA(PCA-RBF), J48 based PCA(J48-PCA), Random T.Karthikeyan et al. / International Journal of Engineering and Technology (IJET) ISSN : 0975-4024 Vol 6 No 1 Feb-Mar 2014 381