A Genetic Embedded Approach for Gene Selection and Classiﬁcation of Microarray Data Jose Crispin Hernandez Hernandez, B´ eatrice Duval, and Jin-Kao Hao LERIA, Universit´ e d’Angers, 2 Boulevard Lavoisier, 49045 Angers, France {josehh,bd,hao}@info.univ-angers.fr Abstract. Classiﬁcation of microarray data requires the selection of subsets of relevant genes in order to achieve good classiﬁcation perfor- mance. This article presents a genetic embedded approach that performs the selection task for a SVM classiﬁer. The main feature of the proposed approach concerns the highly specialized crossover and mutation oper- ators that take into account gene ranking information provided by the SVM classiﬁer. The eﬀectiveness of our approach is assessed using three well-known benchmark data sets from the literature, showing highly com- petitive results. Keywords: Microarray gene expression, Feature selection, Genetic Al- gorithms, Support vector machines. 1 Introduction Recent advances in DNA microarray technologies enable to consider molecular cancer diagnosis based on gene expression. Classiﬁcation of tissue samples from gene expression levels aims to distinguish between normal and tumor samples, or to recognize particular kinds of tumors [9,2]. Gene expression levels are ob- tained by cDNA microarrays and high density oligonucleotide chips, that allow to monitor and measure simultaneously gene expressions for thousands of genes in a sample. So, data that are currently available in this ﬁeld concern a very large number of variables (thousands of gene expressions) relative to a small number of observations (typically under one hundred samples). This characteristic, known as the ”curse of dimensionality”, is a diﬃcult problem for classiﬁcation methods and requires special techniques to reduce the data dimensionality in order to obtain reliable predictive results. Feature selection aims at selecting a (small) subset of informative features from the initial data in order to obtain high classiﬁcation accuracy [11]. In the literature there are two main approaches to solve this problem: the ﬁlter ap- proach and the wrapper approach [11]. In the ﬁlter approach, feature selection is performed without taking into account the classiﬁcation algorithm that will be applied to the selected features. So a ﬁlter algorithm generally relies on a relevance measure that evaluates the importance of each feature for the classi- ﬁcation task. A feasible approach to ﬁlter selection is to rank all the features E. Marchiori, J.H. Moore, and J.C. Rajapakse (Eds.): EvoBIO 2007, LNCS 4447, pp. 90–101, 2007. c  Springer-Verlag Berlin Heidelberg 2007