1 Feature Selection Algorithms in Classification Problems: An Experimental Evaluation MICHAEL DOUMPOS, ATHINA SALAPPA Department of Production Engineering and Management Technical University of Crete University Campus, 73100 Chania Greece Abstract: Feature selection (FS) is a major issue in developing efficient pattern recognition systems. FS refers to the selection of the most appropriate subset of features that describes (adequately) a given classification task. The objective of this paper is to perform a thorough analysis of the performance and efficiency of feature selection algorithms (FSAs). The analysis covers a variety of important issues with respect to the functionality of FSAs, such as: (a) their ability to identify relevant features, (b) the performance of the classification models developed on a reduced set of features, (c) the reduction in the number of features, and (d) the interactions between different FSAs with the techniques used to develop a classification model. The analysis considers a variety of FSAs and classification methods. Key-Words: Feature Selection, Knowledge Discovery, Pattern Recognition, Machine Learning. 1 Introduction A classification problem involves the assignment of some objects to a set of predefined classes. Each object i is assumed to be a multivariate vector x i =(x i1 , x i2 , …, x in ), where x ij is the description of object i on feature x j . Essentially, the objective in a classification problem is to identify an unknown mapping function f(x) that assigns each object to one of the predefined classes as accurately as possible. The development of f is based on a training sample consisting of m objects (x 1 , c 1 ), (x 2 , c 2 ), …, (x m , c m ), where c i denotes the class assignment for object i. Given such a training sample, the specification of f can be performed in many different ways using well-known methods. The appropriate specification of the classification model f depends strongly on the quality of the training data. This is mainly related to the number of training objects and the adequacy of the features used in the analysis. FS involves the latter issue. The FS problem refers to the selection of the appropriate features that should be introduced in the analysis in order to maximize the performance of the resulting model. This has significant implications for issues such as [7]: (1) noise reduction through the elimination of noisy features, (2) reduction of the computational effort required to develop and implement an appropriate model, (3) simplification of the resulting models, and (4) facilitation of the easy use and updating of the models. FS is usually performed as a preprocessing stage prior to model development, using special algorithms. The development of FSAs has been an active research topic in data mining and machine learning. FSAs are computational processes, which are used to select a set of features that optimizes an evaluation measure representing the quality of the features. The research on this topic has been mainly focused on algorithmic developments, experimental evaluations and real world applications. However, most of the previous studies on the evaluation of FSAs’ performance has focused on a limited number of algorithms and a limited number of methods. This paper provides an extensive analysis of FSAs’ performance in an experimental context using both real-world data sets as well as artificially generated data with pre-specified characteristics. The contribution of the paper compared to previous studies involves the analysis of a variety of FSAs combined with different popular classification methods including statistical and machine learning techniques. Such an analysis enables the investigation of the interactions between FSAs and the methods used for model development, as well as between the FSAs performance and the data set characteristics. The rest of the paper is organized as follows: section 2 outlines the main characteristics and functionalities of FSAs. Section 3 describes the experimental setup, section 4 presents the obtained