ORIGINAL ARTICLE
M.S. Mohamad (*) · S. Omatu (*) · M. Yoshioka
Department of Computer Science and Intelligent Systems, Graduate
School of Engineering, Osaka Prefecture University, Sakai, Osaka
599-8531, Japan
e-mail: mohd.saberi@sig.cs.osakafu-u.ac.jp; omatu@cs.osakafu-u.ac.jp
S. Deris · M.F. Misman
Department of Software Engineering, Faculty of Computer Science
and Information Systems, Universiti Teknologi Malaysia, Johore,
Malaysia
This work was presented in part at the 13th International Symposium
on Artificial Life and Robotics, Oita, Japan, January 31–February 2,
2008
Artif Life Robotics (2009) 13:414–417 © ISAROB 2009
DOI 10.1007/s10015-008-0534-4
Mohd Saberi Mohamad · Sigeru Omatu · Safaai Deris
Muhammad Faiz Misman · Michifumi Yoshioka
Selecting informative genes from microarray data by using hybrid methods
for cancer classification
data. This classic approach may fail when dealing with
atypical tumors or morphologically indistinguishable
tumor subtypes. Advances in the area of microarray-based
expression analysis have led to the promise of cancer diag-
nosis using new molecular-based approaches.
1
A microar-
ray machine is used to measure the expression levels of
thousands of genes simultaneously in a cell mixture, and
finally it produces microarray data. The task of cancer clas-
sification using microarray data is to classify tissue samples
into related classes of phenotypes, e.g., cancer versus
normal.
2
Given N tissue samples and expression of M genes,
microarray data are stored in a matrix, as shown in Fig. 1.
Cancer classification using these data poses a major chal-
lenge because of the following characteristics:
• M >> N. M is in the range 2000–20 000, while N is in the
range 30–200;
• most genes are not relevant for classifying different tissue
types;
• these data have a noisy nature.
To overcome the challenge, a gene selection approach is
usually used to select a small subset of informative genes
that maximizes the classifier’s ability to classify samples
accurately.
2
This approach has several advantages:
• it can maintain or improve classification accuracy;
• it can reduce the dimensionality of data;
• it can remove noisy genes.
Gene selection methods can be classified into two cate-
gories. If gene selection is carried out independently from
the classification procedure, the method belongs to the filter
approach. Otherwise, it is said to follow a hybrid approach.
Most previous work has used the filter approach to select
genes, since it is computationally more efficient than the
hybrid approach. However, the hybrid approach usually
provides greater accuracy than the filter approach.
3
In this
article, an approach using two hybrid methods is proposed
to select a small subset of informative genes for cancer
classification.
Abstract Gene expression technology, namely microar-
rays, offers the ability to measure the expression levels of
thousands of genes simultaneously in biological organisms.
Microarray data are expected to be of significant help in the
development of an efficient cancer diagnosis and classifica-
tion platform. A major problem in these data is that the
number of genes greatly exceeds the number of tissue
samples. These data also have noisy genes. It has been
shown in literature reviews that selecting a small subset of
informative genes can lead to improved classification accu-
racy. Therefore, this paper aims to select a small subset of
informative genes that are most relevant for cancer classifi-
cation. To achieve this aim, an approach using two hybrid
methods has been proposed. This approach is assessed and
evaluated on two well-known microarray data sets, showing
competitive results.
Key words Cancer classification · Genetic algorithm · Gene
selection · Hybrid method · Microarray data
1 Introduction
Traditional cancer diagnosis relies on a complex and
inexact combination of clinical and histopathological
Received and accepted: June 11, 2008