International Journal of Advanced Science and Technology Vol.87 (2016), pp.1-8 http://dx.doi.org/10.14257/ijast.2016.87.01 ISSN: 2005-4238 IJAST Copyright ⓒ 2016 SERSC Feature Selection Using Rough Set For Improving the Performance of the Supervised Learner D. Asir Antony Gnana Singh, E. Jebamalar Leavline, E. Priyanka and C. Sumathi University College of Engineering, Bharathidasan Institute of Technology Campus, Anna University, Tiruchirappalli- 620 024 asirantony@gnail.com, jebi.lee@gmail.com, epriyanka76@gmail.com, sumathi205098@gmail.com Abstract Prediction plays a significant role in the human life to predict the situation, climate, finance, outcome of the particular event or activities, etc. This predication can be achieved by the classifier which is formally known as supervised learner. The classifier can be built using the dataset and its performance is based on the attributes or features present in the dataset which are highly relevant to the predictive target attributes. The feature selection process removes the redundant and irrelevant features from the dataset to improve the performance of the classifier. This paper proposes a rough set-based feature selection method to remove the redundant and irrelevant features in order to improve the performance the classifier. The proposed method is tested on the various datasets with the various supervised learning algorithms and it is evident that the proposed method producing the better performance than the other methods compared. Keywords: Rough set, classification, feature reduction, classifier, data mining 1. Introduction Data mining is a process of analyzing data from huge volumes of data for obtaining useful information that can be the desirable pattern to extract the knowledge. In other words data mining helps us to perform the prediction by building the predictive model which is also known as classifier that predicts the unknown data from the known data. The machine learning algorithms are usually divided into two different categories: supervised learning and unsupervised learning algorithm. The supervised learning algorithm is also known as classification algorithm that builds the classifier to perform classification or prediction. Several classifiers have been developed in the classification literature including k- nearest neighbor algorithm (k-NN), Naïve Bayes (NB), support vector machine (SVM), decision tree, and so on. The unsupervised learning algorithm is the clustering algorithm that builds the clustering model in order to cluster or group the objects into similar categories. The feature selection, variable selection or variable subset selection is a process of obtaining a subset of relevant features from large dataset. Too many features may affect the classification accuracy. Hence, the feature selection is employed in data mining in order to improve the accuracy of classifier. The feature selection algorithm can be classified into wrapper, filter, and embedded method. In this filter method, the subset selection procedure is independent to the learning algorithm. This method leads to a faster learning process. However, the resulting subset with a specific criterion may not work very well in the learning algorithm. The wrapper method uses a predictive model to select the feature subsets. This method attempts to select the significant features of minimal size according to the criteria based on the output of supervised learning that is adopted for the selection process. The newly selected feature subset is used to train a model. This method trains a new model for all subsets, and hence