International Journal of Advanced Science and Technology Vol. 29, No. 6, (2020), pp. 8012-8024 8012 ISSN: 2005-4238 IJAST Copyright 2020 SERSC MRMR-BAT-HS: A Clinical Decision Support System for Cancer Diagnosis 1 Bibhuprasad Sahu*, 2 Amrutanshu Panigrahi, 3 Subhashree Sukla, 4 Bhramara Bar Biswal 1, 3 Dept of Computer Science and Engineering, Gandhi Institute For Technology, Bhubaneswar, Odisha, India 2 Research Scholar, SOA University, Bhubaneswar, Odisha, India. 4 Dept of Computer Science and Application, College of Engineering, Bhubaneswar, Odisha, India 1 prasadnikhil176@gmail.com, 2 amrutansup89@gmail.com, 3 subhashreesukla@gift.edu.in, 4 b.b.biswal1969@gmail.com Abstract A novel clinical support system using a hybrid multi-stage biomarker gene identification method MRMR-BAT-HS is proposed for biomarker gene selection from the microarray datasets. The microarray dataset consists of irrelevant, redundant, and noisy genes; from these few informative genes are needed to identify the cancer disease. Informative genes selection form a huge amount of genes is a challenging task. This is also known as the curse of dimensionality. Optimization algorithms are used for solving the gene selection problem. In this proposed method we have implemented both filter and wrapper method biomarker gene selection. In the filter stage, we have used MRMR (minimum redundancy and maximum relevance) to select the subset of featured genes. In the wrapper approach, we have combined two featured metaheuristic approaches (BAT-HS) with Support Vector Machine. This approach is applied with various microarray datasets to test accuracy performance using leave one out cross-validation method (LOOCV) method. Performance evaluation of the proposed one is compared with various gene selection methods. It suggests that the outcome of the proposed method is impressive than others. The relevancy of selected genes functions is investigated to check the classification performance superiority. Keywords: MRMR, BAT, Harmony Search, SVM, Feature Selection. 1. Introduction Microarray technology enables the biologist to study thousand no genes in a single experiment which reproduces gene expression data. Such gene expression data has great potential in various diseases like diabetes, cancer, Alzheimer's and Parkinson‟s disease, etc. These data sets provide genetic information of the patient, which enhances treatment decision and classification accuracy. In microarray data analysis, the ratio of gene number is high then no of samples. To achieve high accuracy from the curse of dimension case itself is a challenging task to identify the informative genes from a huge no of genes. This enhances the speed of the machine learning algorithm and the interpretability of induced models[1][2]. This may solve the problem of overfitting. From the three basic methods for feature selection like a filter, wrapper, embedded, in the filter method it works like black box identify the gene subsets from the huge number by hiding the impact of the biomarkers on classification accuracy, Wrapper search method is adopted to identify sign interpretability of induced models significant genes by removing irrelevant and redundant features. Evaluation of selected genes is done to improve the classification accuracy[3][4]. In this research, we have adopted MRMR (minimum redundancy and maximum relevance) to select the subset of featured genes. In the case of embedded the feature, the selection is embedded with classification because the gene selection is embedded with the