BioSystems 85 (2006) 165–176 Interpretable gene expression classifier with an accurate and compact fuzzy rule base for microarray data analysis Shinn-Ying Ho a,b, , Chih-Hung Hsieh b , Hung-Ming Chen c , Hui-Ling Huang d a Department of Biological Science and Technology, National Chiao Tung University, Hsinchu, Taiwan b Institute of Bioinformatics, National Chiao Tung University, Hsinchu, Taiwan c Institute of Information Engineering and Computer Science, Feng Chia University, Taichung, Taiwan d Department of Information Management, Jin Wen Institute of Technology, Hsin-Tien, Taipei, Taiwan Received 10 August 2005; received in revised form 14 December 2005; accepted 3 January 2006 Abstract An accurate classifier with linguistic interpretability using a small number of relevant genes is beneficial to microarray data analysis and development of inexpensive diagnostic tests. Several frequently used techniques for designing classifiers of microarray data, such as support vector machine, neural networks, k-nearest neighbor, and logistic regression model, suffer from low interpretabilities. This paper proposes an interpretable gene expression classifier (named iGEC) with an accurate and compact fuzzy rule base for microarray data analysis. The design of iGEC has three objectives to be simultaneously optimized: maximal classification accuracy, minimal number of rules, and minimal number of used genes. An “intelligent” genetic algorithm IGA is used to efficiently solve the design problem with a large number of tuning parameters. The performance of iGEC is evaluated using eight commonly-used data sets. It is shown that iGEC has an accurate, concise, and interpretable rule base (1.1 rules per class) on average in terms of test classification accuracy (87.9%), rule number (3.9), and used gene number (5.0). Moreover, iGEC not only has better performance than the existing fuzzy rule-based classifier in terms of the above-mentioned objectives, but also is more accurate than some existing non-rule-based classifiers. © 2006 Elsevier Ireland Ltd. All rights reserved. Keywords: Fuzzy classifier; Gene expression; Intelligent genetic algorithm; Microarray data analysis; Pattern recognition 1. Introduction Microarray is a useful technique for measuring expression data of thousands of genes simultaneously. Microarray gene expression profiling technology is one of the most important research topics in clinical diag- nosis of disease. Gene expression data provide valuable information in the understanding of genes, biological networks, and cellular states. One goal in analyzing expression data is to determine how the expression of Corresponding author. Tel.: +886 35131405; fax: +886 35729288. E-mail address: syho@mail.nctu.edu.tw (S.-Y. Ho). any particular gene might affect the expression of other genes in the same genetic network (Ressom et al., 2003; Woolf and Wang, 2000; Kauffman et al., 2003; Wahde and Hertz, 2000). Another goal is to determine how genes are expressed as a result of certain cellular conditions (e.g., how genes are expressed in diseased and healthy cells) (Creighton and Hanash, 2003). The practical applications of microarray gene expres- sion profiles include management of cancer and infec- tious diseases. The prediction of the diagnostic category of a tissue sample from its expression array phenotype from tissues in identified categories is known as classifi- cation. Because the number of tissue samples is usually much smaller than the number of genes, it may occur 0303-2647/$ – see front matter © 2006 Elsevier Ireland Ltd. All rights reserved. doi:10.1016/j.biosystems.2006.01.002