ORIGINAL ARTICLE M.S. Mohamad (*) · S. Omatu (*) · M. Yoshioka Department of Computer Science and Intelligent Systems, Graduate School of Engineering, Osaka Prefecture University, Sakai, Osaka 599-8531, Japan e-mail: mohd.saberi@sig.cs.osakafu-u.ac.jp; omatu@cs.osakafu-u.ac.jp S. Deris · M.F. Misman Department of Software Engineering, Faculty of Computer Science and Information Systems, Universiti Teknologi Malaysia, Johore, Malaysia This work was presented in part at the 13th International Symposium on Artificial Life and Robotics, Oita, Japan, January 31–February 2, 2008 Artif Life Robotics (2009) 13:414–417 © ISAROB 2009 DOI 10.1007/s10015-008-0534-4 Mohd Saberi Mohamad · Sigeru Omatu · Safaai Deris Muhammad Faiz Misman · Michifumi Yoshioka Selecting informative genes from microarray data by using hybrid methods for cancer classification data. This classic approach may fail when dealing with atypical tumors or morphologically indistinguishable tumor subtypes. Advances in the area of microarray-based expression analysis have led to the promise of cancer diag- nosis using new molecular-based approaches. 1 A microar- ray machine is used to measure the expression levels of thousands of genes simultaneously in a cell mixture, and finally it produces microarray data. The task of cancer clas- sification using microarray data is to classify tissue samples into related classes of phenotypes, e.g., cancer versus normal. 2 Given N tissue samples and expression of M genes, microarray data are stored in a matrix, as shown in Fig. 1. Cancer classification using these data poses a major chal- lenge because of the following characteristics: M >> N. M is in the range 2000–20 000, while N is in the range 30–200; • most genes are not relevant for classifying different tissue types; • these data have a noisy nature. To overcome the challenge, a gene selection approach is usually used to select a small subset of informative genes that maximizes the classifier’s ability to classify samples accurately. 2 This approach has several advantages: • it can maintain or improve classification accuracy; • it can reduce the dimensionality of data; • it can remove noisy genes. Gene selection methods can be classified into two cate- gories. If gene selection is carried out independently from the classification procedure, the method belongs to the filter approach. Otherwise, it is said to follow a hybrid approach. Most previous work has used the filter approach to select genes, since it is computationally more efficient than the hybrid approach. However, the hybrid approach usually provides greater accuracy than the filter approach. 3 In this article, an approach using two hybrid methods is proposed to select a small subset of informative genes for cancer classification. Abstract Gene expression technology, namely microar- rays, offers the ability to measure the expression levels of thousands of genes simultaneously in biological organisms. Microarray data are expected to be of significant help in the development of an efficient cancer diagnosis and classifica- tion platform. A major problem in these data is that the number of genes greatly exceeds the number of tissue samples. These data also have noisy genes. It has been shown in literature reviews that selecting a small subset of informative genes can lead to improved classification accu- racy. Therefore, this paper aims to select a small subset of informative genes that are most relevant for cancer classifi- cation. To achieve this aim, an approach using two hybrid methods has been proposed. This approach is assessed and evaluated on two well-known microarray data sets, showing competitive results. Key words Cancer classification · Genetic algorithm · Gene selection · Hybrid method · Microarray data 1 Introduction Traditional cancer diagnosis relies on a complex and inexact combination of clinical and histopathological Received and accepted: June 11, 2008