ORIGINAL ARTICLE Genetic algorithm based cancerous gene identification from microarray data using ensemble of filter methods Manosij Ghosh 1 & Sukdev Adhikary 1 & Kushal Kanti Ghosh 1 & Aritra Sardar 1 & Shemim Begum 2 & Ram Sarkar 1 Received: 9 February 2018 /Accepted: 12 July 2018 # International Federation for Medical and Biological Engineering 2018 Abstract Microarray datasets play a crucial role in cancer detection. But the high dimension of these datasets makes the classification challenging due to the presence of many irrelevant and redundant features. Hence, feature selection becomes irreplaceable in this field because of its ability to remove the unrequired features from the system. As the task of selecting the optimal number of features is an NP-hard problem, hence, some meta-heuristic search technique helps to cope up with this problem. In this paper, we propose a 2-stage model for feature selection in microarray datasets. The ranking of the genes for the different filter methods are quite diverse and effectiveness of rankings is datasets dependent. First, we develop an ensemble of filter methods by considering the union and intersection of the top-n features of ReliefF, chi-square, and symmetrical uncertainty. This ensemble allows us to combine all the information of the three rankings together in a subset. In the next stage, we use genetic algorithm (GA) on the union and intersection to get the fine-tuned results, and union performs better than the latter. Our model has been shown to be classifier independent through the use of three classifiersmulti-layer perceptron (MLP), support vector machine (SVM), and K- nearest neighbor (K-NN). We have tested our model on five cancer datasetscolon, lung, leukemia, SRBCT, and prostate. Experimental results illustrate the superiority of our model in comparison to state-of-the-art methods. Keywords Wrapper method . Filter method . Ensemble . Microarray data . Cancer detection 1 Introduction DNA microarray provides the expression profiles of many genes which allow insights into the physiological processes and disease etiology meditated by those genes. Regulation of the expression of a gene occurs during the transcription of DNA into messenger ribonucleic acid (mRNA). Even though differential degradation of mRNA in the cytoplasm and others also cause regulation, relative quantity of mRNA species in cells is of great interest. The functions of gene expressions are ascertained from their upregulation and downregulation, which give gene expressions values significant importance. In the normal cell, continuous mutation damages the DNA which may lead to the impairment of cell replication. This is one of the main causes of formation of malignant tumor cells. Microarray gene expression data contain information regard- ing expression levels of the genes in certain tissue and cell. This data serve as a key source of information in different biological studies and analysis. Microarray data are therefore very useful in the field of tumor and cancerous gene detection. There are quite a few types of cancers depending on where these cancers form and also depending on the types of cells * Manosij Ghosh manosij1996@gmail.com Sukdev Adhikary sukdev.999@gmail.com Kushal Kanti Ghosh kushalkanti1999@gmail.com Aritra Sardar sardararitra97@gmail.com Shemim Begum shemim_begum@yahoo.com Ram Sarkar raamsarkar@gmail.com 1 Department of Computer Science and Engineering, Jadavpur University, Kolkata, India 2 Department of Computer Science and Engineering, Government College of Engineering & Textile Technology, Berhampore, West Bengal, India Medical & Biological Engineering & Computing https://doi.org/10.1007/s11517-018-1874-4