Metaheuristics for feature selection: application to sepsis outcome prediction Susana M. Vieira * , Luis F. Mendonc ¸a †* , Gonc ¸alo J. Farinha * and Jo˜ ao M.C. Sousa * * Technical University of Lisbon, Instituto Superior T´ ecnico Dept. of Mechanical Engineering, CIS/IDMEC LAETA, Lisbon, Portugal Email:susana.vieira@ist.utl.pt † Escola Superior N´ autica Infante D. Henrique Department of Marine Engineering, Lisbon, Portugal Abstract—This paper proposes the application of a new binary particle swarm optimization (BPSO) method to feature selection problems. Two enhanced versions of binary particle swarm optimization, designed to cope with premature convergence of the BPSO algorithm, are proposed. These methods control the swarm variability using the velocity and the similarity between best swarm solutions. The proposed PSO methods use neural networks, fuzzy models and support vector machines in a wrap- per approach, and are tested in a benchmark database. It was shown that the proposed BPSO approaches require an inferior simulation time, less selected features and increase accuracy. The best BPSO is then compared with genetic algorithms (GA) and applied to a real medical application, a sepsis patient database. The objective is to predict the outcome (survived or deceased) of the sepsis patients. It was shown that the proposed BPSO approaches are similar in terms of model accuracy when compared to GA, while requiring an inferior simulation time and less selected features. I. I NTRODUCTION Knowledge is only valuable when it can be used efﬁciently and effectively. Therefore, extensive research has been made in the urge to ﬁnd new computational theories and tools, that can aid to the extraction of useful information (knowledge) from rapidly growing databases. The ﬁeld of science concerned with automated knowledge discovery is called knowledge discovery in databases (KDD) [1]. The KDD process comprises a series of steps to extract knowledge from data. The ﬁrst step is selection and it consists in acquiring the most useful target data from the available databases. The target database has to be adequately chosen so that it contains sufﬁcient information regarding the system we want to describe. The next two steps (feature construction and feature selection), are part of the feature extraction (FE) process and are used with the purpose of extracting the most relevant features (or features) of the target data. Feature con- struction (also called data preprocessing), [2], comprehends all the methods that involve some degree of modiﬁcation to the original feature, e.g. data standardization, normalization and noise ﬁltering. The objective of this crucial preprocessing step is to make the underlying information in data easier to identify. In opposition, feature selection (FS) does not induce a transformation to the features, it simply searches for the optimal feature subset discarding the features with lowest informative potential. There is a great number of available FS techniques, but there are three aspects that roughly differentiate them [2]: • feature subset generation (or search strategy); • evaluation criterion deﬁnition (e.g. relevance index or predictive performance); • evaluation criterion estimation (or assessment method). The ﬁrst refers to the applied search strategy to evaluate the solutions in the space of possible feature combinations. The last two correspond to the evaluation criterion, i.e. the method and measures used to assess the quality of each feature subset. Based on the subset evaluation procedure we may divide FS algorithms in two classes, wrapper methods and ﬁlter methods. The main advantage of wrapper method over the ﬁlter methods is that, in wrappers the predictive performance of the ﬁnal selected subset is correlated with the chosen relevance measure. Once the objective in this work is to improve the modeling quality using the information underlying in large databases, it has been decided to use wrapper methods. Nevertheless, wrapper methods have the associated problem of having to train a classiﬁer for each tested feature subset. This means testing all the possible combinations of features will be virtually impossible. To solve this problem several search heuristics have been proposed, e.g. genetic algorithms (GA), particle swarm (PSO), ant colony optimization (ACO). These methods are able to ﬁnd fairly good solutions without searching the entire workspace. The feature selection techniques in study are two modiﬁed versions of the binary PSO algorithm (BPSO) and genetic algorithms. They are used as a wrapper method, that is solely a feature selection method in which every candidate solution is evaluated using a learning machine, [2]. In this work, we have chosen to use the following modeling methodologies: neural networks (NN), support vector machines (SVM) and fuzzy modeling (FM). The most important characteristic of these methods are their universal function approximation properties, [3]. The objective of this paper is the application of feature selection to a publicly available septic shock patient database in order to obtain more accurate models of the disease. We will start by introducing the main concepts of modeling in Section II. Then, the implementation of wrapper method- ologies is addressed in Section III. In Section IV, the applied metaheuristics are presented, namely the proposed BPSO- U.S. Government work not protected by U.S. copyright WCCI 2012 IEEE World Congress on Computational Intelligence June, 10-15, 2012 - Brisbane, Australia IEEE CEC