A Genetic Embedded Approach for Gene Selection and Classification of Microarray Data Jose Crispin Hernandez Hernandez, B´ eatrice Duval, and Jin-Kao Hao LERIA, Universit´ e d’Angers, 2 Boulevard Lavoisier, 49045 Angers, France {josehh,bd,hao}@info.univ-angers.fr Abstract. Classification of microarray data requires the selection of subsets of relevant genes in order to achieve good classification perfor- mance. This article presents a genetic embedded approach that performs the selection task for a SVM classifier. The main feature of the proposed approach concerns the highly specialized crossover and mutation oper- ators that take into account gene ranking information provided by the SVM classifier. The effectiveness of our approach is assessed using three well-known benchmark data sets from the literature, showing highly com- petitive results. Keywords: Microarray gene expression, Feature selection, Genetic Al- gorithms, Support vector machines. 1 Introduction Recent advances in DNA microarray technologies enable to consider molecular cancer diagnosis based on gene expression. Classification of tissue samples from gene expression levels aims to distinguish between normal and tumor samples, or to recognize particular kinds of tumors [9,2]. Gene expression levels are ob- tained by cDNA microarrays and high density oligonucleotide chips, that allow to monitor and measure simultaneously gene expressions for thousands of genes in a sample. So, data that are currently available in this field concern a very large number of variables (thousands of gene expressions) relative to a small number of observations (typically under one hundred samples). This characteristic, known as the ”curse of dimensionality”, is a difficult problem for classification methods and requires special techniques to reduce the data dimensionality in order to obtain reliable predictive results. Feature selection aims at selecting a (small) subset of informative features from the initial data in order to obtain high classification accuracy [11]. In the literature there are two main approaches to solve this problem: the filter ap- proach and the wrapper approach [11]. In the filter approach, feature selection is performed without taking into account the classification algorithm that will be applied to the selected features. So a filter algorithm generally relies on a relevance measure that evaluates the importance of each feature for the classi- fication task. A feasible approach to filter selection is to rank all the features E. Marchiori, J.H. Moore, and J.C. Rajapakse (Eds.): EvoBIO 2007, LNCS 4447, pp. 90–101, 2007. c Springer-Verlag Berlin Heidelberg 2007