BioSystems 85 (2006) 165–176
Interpretable gene expression classifier with an accurate and
compact fuzzy rule base for microarray data analysis
Shinn-Ying Ho
a,b,∗
, Chih-Hung Hsieh
b
, Hung-Ming Chen
c
, Hui-Ling Huang
d
a
Department of Biological Science and Technology, National Chiao Tung University, Hsinchu, Taiwan
b
Institute of Bioinformatics, National Chiao Tung University, Hsinchu, Taiwan
c
Institute of Information Engineering and Computer Science, Feng Chia University, Taichung, Taiwan
d
Department of Information Management, Jin Wen Institute of Technology, Hsin-Tien, Taipei, Taiwan
Received 10 August 2005; received in revised form 14 December 2005; accepted 3 January 2006
Abstract
An accurate classifier with linguistic interpretability using a small number of relevant genes is beneficial to microarray data analysis
and development of inexpensive diagnostic tests. Several frequently used techniques for designing classifiers of microarray data,
such as support vector machine, neural networks, k-nearest neighbor, and logistic regression model, suffer from low interpretabilities.
This paper proposes an interpretable gene expression classifier (named iGEC) with an accurate and compact fuzzy rule base for
microarray data analysis. The design of iGEC has three objectives to be simultaneously optimized: maximal classification accuracy,
minimal number of rules, and minimal number of used genes. An “intelligent” genetic algorithm IGA is used to efficiently solve
the design problem with a large number of tuning parameters. The performance of iGEC is evaluated using eight commonly-used
data sets. It is shown that iGEC has an accurate, concise, and interpretable rule base (1.1 rules per class) on average in terms of test
classification accuracy (87.9%), rule number (3.9), and used gene number (5.0). Moreover, iGEC not only has better performance
than the existing fuzzy rule-based classifier in terms of the above-mentioned objectives, but also is more accurate than some existing
non-rule-based classifiers.
© 2006 Elsevier Ireland Ltd. All rights reserved.
Keywords: Fuzzy classifier; Gene expression; Intelligent genetic algorithm; Microarray data analysis; Pattern recognition
1. Introduction
Microarray is a useful technique for measuring
expression data of thousands of genes simultaneously.
Microarray gene expression profiling technology is one
of the most important research topics in clinical diag-
nosis of disease. Gene expression data provide valuable
information in the understanding of genes, biological
networks, and cellular states. One goal in analyzing
expression data is to determine how the expression of
∗
Corresponding author. Tel.: +886 35131405; fax: +886 35729288.
E-mail address: syho@mail.nctu.edu.tw (S.-Y. Ho).
any particular gene might affect the expression of other
genes in the same genetic network (Ressom et al., 2003;
Woolf and Wang, 2000; Kauffman et al., 2003; Wahde
and Hertz, 2000). Another goal is to determine how genes
are expressed as a result of certain cellular conditions
(e.g., how genes are expressed in diseased and healthy
cells) (Creighton and Hanash, 2003).
The practical applications of microarray gene expres-
sion profiles include management of cancer and infec-
tious diseases. The prediction of the diagnostic category
of a tissue sample from its expression array phenotype
from tissues in identified categories is known as classifi-
cation. Because the number of tissue samples is usually
much smaller than the number of genes, it may occur
0303-2647/$ – see front matter © 2006 Elsevier Ireland Ltd. All rights reserved.
doi:10.1016/j.biosystems.2006.01.002