ORIGINAL ARTICLE M.S. Mohamad (*) · S. Omatu (*) · M. Yoshioka Department of Computer Science and Intelligent Systems, Graduate School of Engineering, Osaka Prefecture University, Sakai, Osaka 599-8531, Japan e-mail: mohd.saberi@sig.cs.osakafu-u.ac.jp; omatu@cs.osakafu-u.ac.jp S. Deris · M.F. Misman Department of Software Engineering, Faculty of Computer Science and Information Systems, Universiti Teknologi Malaysia, Johore, Malaysia This work was presented in part at the 13th International Symposium on Artiﬁcial Life and Robotics, Oita, Japan, January 31–February 2, 2008 Artif Life Robotics (2009) 13:414–417 © ISAROB 2009 DOI 10.1007/s10015-008-0534-4 Mohd Saberi Mohamad · Sigeru Omatu · Safaai Deris Muhammad Faiz Misman · Michifumi Yoshioka Selecting informative genes from microarray data by using hybrid methods for cancer classiﬁcation data. This classic approach may fail when dealing with atypical tumors or morphologically indistinguishable tumor subtypes. Advances in the area of microarray-based expression analysis have led to the promise of cancer diag- nosis using new molecular-based approaches. 1 A microar- ray machine is used to measure the expression levels of thousands of genes simultaneously in a cell mixture, and ﬁnally it produces microarray data. The task of cancer clas- siﬁcation using microarray data is to classify tissue samples into related classes of phenotypes, e.g., cancer versus normal. 2 Given N tissue samples and expression of M genes, microarray data are stored in a matrix, as shown in Fig. 1. Cancer classiﬁcation using these data poses a major chal- lenge because of the following characteristics: • M >> N. M is in the range 2000–20 000, while N is in the range 30–200; • most genes are not relevant for classifying different tissue types; • these data have a noisy nature. To overcome the challenge, a gene selection approach is usually used to select a small subset of informative genes that maximizes the classiﬁer’s ability to classify samples accurately. 2 This approach has several advantages: • it can maintain or improve classiﬁcation accuracy; • it can reduce the dimensionality of data; • it can remove noisy genes. Gene selection methods can be classiﬁed into two cate- gories. If gene selection is carried out independently from the classiﬁcation procedure, the method belongs to the ﬁlter approach. Otherwise, it is said to follow a hybrid approach. Most previous work has used the ﬁlter approach to select genes, since it is computationally more efﬁcient than the hybrid approach. However, the hybrid approach usually provides greater accuracy than the ﬁlter approach. 3 In this article, an approach using two hybrid methods is proposed to select a small subset of informative genes for cancer classiﬁcation. Abstract Gene expression technology, namely microar- rays, offers the ability to measure the expression levels of thousands of genes simultaneously in biological organisms. Microarray data are expected to be of signiﬁcant help in the development of an efﬁcient cancer diagnosis and classiﬁca- tion platform. A major problem in these data is that the number of genes greatly exceeds the number of tissue samples. These data also have noisy genes. It has been shown in literature reviews that selecting a small subset of informative genes can lead to improved classiﬁcation accu- racy. Therefore, this paper aims to select a small subset of informative genes that are most relevant for cancer classiﬁ- cation. To achieve this aim, an approach using two hybrid methods has been proposed. This approach is assessed and evaluated on two well-known microarray data sets, showing competitive results. Key words Cancer classiﬁcation · Genetic algorithm · Gene selection · Hybrid method · Microarray data 1 Introduction Traditional cancer diagnosis relies on a complex and inexact combination of clinical and histopathological Received and accepted: June 11, 2008