A modified Kolmogorov-Smirnov Correlation Based Filter algorithm for Feature Selection P.Srinivasu 1 , P.S.Avadhani 2 , Tummala Pradeep 3 1 Associate Professor, Department of CSE,ANITS,Visakhapatnam 2 Professor, Department of CS&SE, Andhra University,Visakhapatnam 3 IV/IV CSE Student, Department of CSE, BIT, Ranchi, Jharkhand Abstract: Feature selection is a technique of selecting a subset of relevant features from which the classification model can be constructed for a particular task. Feature selection is a preprocessing step of machine learning which is effective in reducing dimensionality, removing irrelevant data, increasing learning accuracy, and improving results. In this paper, a modified Kolmogorov-Smirnov Correlation Based Filter algorithm for Feature Selection is proposed based on Kolmogorov-Smirnov statistic which uses class label information while comparing feature pairs. Results obtained from this algorithm are compared with two other algorithms, Correlation Feature Selection algorithm (CFS) and simple Kolmogorov Smirnov-Correlation Based Filter (KS-CBF), capable of removing irrelevancy and redundancy..The classification accuracy is achieved with the reduced feature set using the proposed approach with two of the standard classifiers such as the Decision-Tree classifier and the K-NN classifier. 1 Introduction Feature selection is a preprocessing step of machine learning which is effective in reducing dimensionality, removing irrelevant data, increasing learning accuracy, and improving results. In recent years, data has become increasingly larger in both number of instances and number of features in many applications such as genome projects, text categorization, image retrieval, and customer relationship management. The increase of data and features cause serious problems to many machine learning algorithms with respect to scalability and learning performance. Hence, feature selection is very much important for machine learning tasks which include high dimensional data. Feature selection evaluation methods fall into two broad categories, Filter model and Wrapper model [2].The Filter model depends on characteristics of the training data to select some features without involving any learning algorithm. The wrapper model needs one predetermined learning algorithm in feature selection and uses its performance to evaluate and determine which features are to be selected. As for each new subset of features, the wrapper model needs to learn a hypothesis/classifier. It tends to find features better suited to the predetermined learning algorithm resulting in superior learning performance, but it also tends to be more computationally expensive and less general than the Filter model. When there are more number of features, the Filter model can be used because of its computational efficiency. Filters have the advantage of fast execution and generality to a large family of classifiers than wrappers [13]. Figure 1 provides a depiction of a simple classification process where a Feature Selection process that uses a filter is involved. The training and testing datasets after the dimensionality reduction process is fed to the ML (Machine Learning) algorithm. In this paper, we have employed a Filter model for the evaluation of selected features.