International Journal of Innovative Research in Computer Science & Technology (IJIRCST) ISSN: 2347-5552, Volume-2, Issue-3, May-2014 7 Clustering Based Algorithm for Efficient and Effective Feature Selection Performance K. Revathi, T. Kalai Selvi Abstract- Process with high dimensional data is enormous issue in data mining and machine learning applications. Feature selection is the mode of recognize the good number of features that produce well-suited outcome as the unique entire set of features. Feature selection process constructs a pathway to reduce the dimensionality and time complexity and also improve the accuracy level of classifier. In this paper, we use an alternative approach, called affinity propagation algorithm for effective and efficient feature selection and clustering process. The endeavor is to improve the performance in terms accuracy and time complexity. Key Terms- Classification, Data mining, Feature selection, Feature clustering. I.INTRODUCTION Data mining is the route of ascertaining the interesting knowledge from hefty amounts of information repository. Noisy, incomplete, inconsistent records are humdrum properties of huge real world databases and data warehouses. To handle this type of errors, data preprocessing techniques are extremely essential for producing good quality result. Feature selection, also known as attribute subset selection is similar to preprocessing technique, used for dimensionality reduction; improve the classifier accuracy, removing irrelevant and redundant data. Feature selection techniques are categorized into four types: the Filter, wrapper, Embedded, and hybrid methods [1]. Filter method [11], [12] is momentous selection when we use large number of features. Filter the features using ranking based approach. Wrapper method [2], [14] is used to estimate the integrity of the selected subset features by using predictive accuracy of machine learning algorithm [1], [14] which provides greatest accuracy. Embedded method [14] has grand efficiency than remaining methods, Manuscript received April 04,2014, K. Revathi, 1 Computer Science and Engineering, Erode Sengunthar Engineering College,Anna University Chennai, Tamilnadu T. Kalai Selvi,Computer Science and Engineering, Erode Sengunthar Engineering College,Anna University Chennai, Tamilnadu it works with training process. Hybrid method is the mixture of filter and wrapper method. Thus, we will focus on the wrapper method in this paper. Several conventional feature selection algorithms are available but we will focus on application of cluster analysis for more effective feature selection process. Cluster analysis is the progression of grouping similar objects into one class. For produce an optimal result, affinity propagation algorithm have been studied and used in this paper. In general affinity propagation is most flexible and simple clustering algorithm. It works through the concept of “information passing” between data objects. Key benefit of this algorithm is low error; maintain high speed and prominently no need to compute the number of clusters before executing the algorithm. The proposed feature selection process based on affinity propagation algorithm produce optimal subset of features with high accuracy and minimum time requirement. The rest of the paper is organized as follows: in section II, we demonstrate the related work. In section III, we analysis about the process of existing work. In section IV, we summarize the proposed work with comparison analysis. In section V represent the conclusion about this paper. II.RELATED WORK Dimensionality reduction, identifying and removing irrelevant and redundant features are done with the process of feature selection. Feature selection is same as the data preprocessing technique for producing best possible subset of features. There are several algorithms and schemes for feature selection process are obtainable, Relief is well known and good feature estimator. Using relief algorithm [1], [4], [5] estimate the quality of the feature subset but it successfully remove irrelevant features only, does not consider about the redundant features. The Mutual Information [1], [6] is another method for determine the dependence of pair of features and feature with target class. In [3], M. Dash and H. Liu et al, focuses on inconsistency measure for feature selection process with