Neuro-fuzzy Methodology for Selecting Genes Mediating Lung Cancer Rajat K. De 1 and Anupam Ghosh 2 1 Department of Machine Intelligence Unit, Indian Statistical Institute, Kolkata, India rajat@isical.ac.in 2 Department of Computer Science and Engineering, Netaji Subhash Engineering College, Kolkata, India anupam.ghosh@rediffmail.com Abstract. In this article, we describe neuro-fuzzy models under super- vised and unsupervised learning for selecting a few possible genes medi- ating a disease. The methodology involves grouping of genes based on correlation coeﬃcient using microarray gene expression patterns. The most important group is selected using existing neuro-fuzzy systems [1,2,3,4,5]. Finally, a few possible genes are selected from the most impor- tant group using the aforesaid neuro-fuzzy systems. The eﬀectiveness of the methodology has been demonstrated on lung cancer gene expression data sets. The superiority of the methodology has been established with four existing gene selection methods like SAM, SNR, NA and BR. The enrichment of each gene ontology category of the resulting genes was cal- culated by its P -value. The genes output the low P -value, and indicate that they are biologically signiﬁcant. According to the methodology, we have found more true positive genes than the other existing algorithms. 1 Introduction Gene selection refers to the task of selecting some informative genes. The goal of gene selection algorithms is to ﬁlter out a small set of informative genes that best explains experimental variations. It is much cheaper to focus on a small number of informative genes, from the whole genome, that can diﬀerentially express in various diseases. Therefore, using eﬀective gene selection methods, a small list of highly informative genes can be discovered from whole gene set [6], which have direct/indirect role in causing diseases. Thus, these genes can be utilized to construct the classiﬁer for discriminating disease patterns. From data mining point of view, the task of gene selection can be viewed as that of feature selection that is widely used in data preprocessing stage [7,8]. However, gene selection, unlike feature selection in the area of machine learning literature, is characterized by the great diﬀerence between a huge number of genes and very small number of samples. Several attempts have been made during the past several years for develop- ing methodologies or using feature selection algorithms that select informative S.O. Kuznetsov et al. (Eds.): PReMI 2011, LNCS 6744, pp. 388–393, 2011. c  Springer-Verlag Berlin Heidelberg 2011