International Journal of Computer Applications (0975 - 8887) Volume 161 - No.6, March 2017 RGAP: A Rough Set, Genetic Algorithm and Particle Swarm Optimization based Feature Selection Approach Anupriya Gupta Computer Engineering Department Shri G.S Institute of Technology and Science Indore-452003 (M.P.) India Anuradha Purohit Computer Technology and Applications Department Shri G.S Institute of Technology and Science Indore-452003 (M.P.) India ABSTRACT Feature selection plays an important role in improving the classification accuracy by handling redundant or irrelevant features present in the dataset. Various soft computing based hybrid approaches like neuro-fuzzy, genetic-fuzzy, rough set-neuro etc. are proposed by researchers to perform feature selection. The existing approaches gives higher complexity and computational cost with low classification accuracy. Hence to improve the complexity and classification accuracy, a hybrid approach based on Genetic Algorithm (GA), Particle Swarm Optimization (PSO) and Rough Set Theory (RST) to perform feature selection is proposed. In the proposed approach, GA is used as a searching algorithm. To explore search space more efficiently, GA is combined with a PSO based local search operation. Rough Set Attribute Reduction (RSAR) method based on RST is used to compute core reducts. The proposed algorithm is tested on various benchmark datasets. Satisfactory improvements in terms of complexity and classification accuracy have been achieved. Keywords Feature Selection, Particle Swarm Optimization, Genetic Algorithm, Rough Set Theory. 1. INTRODUCTION Feature selection is always an area of interest for the researchers in order to deal with small or large data sets for data mining tasks. Feature selection aims to choose a small number of relevant features to achieve similar or even better classification performance than using all features. Filter and wrapper are two main categories of performing feature selection. Filter method use proxy measure to score a feature subset. On the contrary wrapper methods uses predictive model and are computationally intensive, but usually provide best relevant feature subset [3]. There are many approaches available for performing feature subset selection like piece wise linear network, PLOF’S, PCA, MCES, graph based clustering, soft computing etc. Soft Computing is one of the most widely used approach for feature selection. Soft computing approach is an innovative approach which does not refer to a single field computation but has many components. For more optimized and efficient results hybrid approaches are developed by researchers, combining different soft computing techniques like artificial neural network, fuzzy inference system, approximate reasoning and optimization methods such as evolutionary computation, swarm optimization, rough sets etc. [5]. The empirical results shows that these hybrid components provide most appropriate approaches, to deal with incomplete and imperfect knowledge. Hence yields more appropriate results as compared to single approaches. Xiang yang wang et.al in [4] have proposed a new optimal feature selection technique based on rough sets and particle swarm optimization (PSO). In the proposed approach RST based positive region method is used to compute the core reducts. Further PSO was applied to explore the search and the feature selection process. To validate the result and evaluating fitness of the particle RST based fitness function is used. The proposed approach suffers from premature convergence problem due to PSO. Pradipta Maji in [6] has performed feature selection using rough set theory on fuzzy data sets. In this regard, a novel dimensionality reduction method based on fuzzy-rough sets is presented. To compute the relevant features RST based discernibility matrix method is used for dimensionality reduction technique, which is applied to the fuzzy datasets. To explore the search space another RST based measure of significance method is used. The proposed approach provides efficient result but gives high complexity and computational cost. Si-Yuan Jing in his paper [1] has discussed a hybrid approach ”HGARSTAR”, by combining genetic algorithm and rough set theory for performing feature selection. Initially the core features are computed using RST based positive region method. A novel local search operation based on rough set theory is embedded in genetic algorithm to enhance search for better results. Further to fine tune the search the significance of each feature is computed using measure of significance method of RST and to validate the result again rough set based fitness function is used. The proposed approach suffers from high complexity and extensive computational cost due to RST based local search. In this paper, a hybrid approach ”RGAP: A Rough Set, Genetic Algorithm and Particle Swarm Optimization based Feature Selection Approach” is proposed to perform feature selection. In the proposed approach, GA is used as searching technique and to explore the search space more effectively, GA operators are combined with PSO based local search operation. For obtaining more optimized results, RST based RSAR method is used to compute core reducts. The rest of the paper is organized as follows: Section 2, discusses preliminaries of various techniques used for the proposed method. 1