ORIGINAL RESEARCH ARTICLE A Novel Hybrid Feature Selection Model for Classification of Neuromuscular Dystrophies Using Bhattacharyya Coefficient, Genetic Algorithm and Radial Basis Function Based Support Vector Machine Divya Anand 1 Babita Pandey 2 Devendra K. Pandey 3 Received: 7 June 2016 / Revised: 7 August 2016 / Accepted: 30 August 2016 Ó International Association of Scientists in the Interdisciplinary Areas and Springer-Verlag Berlin Heidelberg 2016 Abstract An accurate classification of neuromuscular disorders is important in providing proper treatment facil- ities to the patients. Recently, the microarray technology is employed to monitor the level of activity or expression of large number of genes simultaneously. The gene expres- sion data derived from the microarray experiment usually involve a large number of genes but a very few number of samples. There is a need to reduce the dimension of gene expression data which intends to find a small set of dis- criminative genes that accurately classifies the samples of various kinds of diseases. So, our goal is to find a small subset of genes which ensures the accurate classification of neuromuscular disorders. In the present paper, we propose a novel hybrid feature selection model for classification of neuromuscular disorders. The process of feature selection is done in two phases by integrating Bhattacharyya coef- ficient and genetic algorithm (GA). In the first phase, we find Bhattacharyya coefficient to choose a candidate gene subset by removing the most redundant genes. In the sec- ond phase, the target gene subset is created by selecting the most discriminative gene subset by applying GA wherein the fitness function is calculated using radial basis function support vector machine (RBF SVM). The proposed hybrid algorithm is applied on two publicly available microarray neuromuscular disorders datasets. The results are compared with two individual techniques of feature selection, namely Bhattacharyya coefficient and GA, and one integrated technique, i.e., Bhattacharyya-GA wherein the fitness function of GA is calculated using four other classifiers, which shows that the proposed integrated method is cap- able of giving the better classification accuracy. Keywords Bhattacharyya coefficient Genetic algorithm Support vector machine Neuromuscular disorders Microarray data Radial basis function 1 Introduction The neuromuscular system in human body provides the vital forces to perform various actions [1]. The neuro- muscular disorder occurs due to the mutation in gene which affects the motor unit. The symptoms of these diseases are progressive in nature. According to Muscular Dystrophy Foundation Australia, genetic testing is used for diagnosis which involves the direct examination of DNA associated with a particular kind of neuromuscular disorder. Usually, blood tests are used for genetic testing which measures the level of certain enzymes in the blood. These days, microarray technology is used to analyze and monitor the whole genome simultaneously [2, 3]. But the problem with the microarray data is small number of samples relatively compared with large number of genes [4]. Most of the genes in samples do not contain useful information as they are redundant, not differentially expressed and not specific to the disease. Thus, reducing the dimension, i.e., number of genes (acting as features in machine learning), prior to the classification task is most important for accurately diagnosing a disease. It does not only increase the & Divya Anand divyaanand.y@gmail.com 1 School of Computer Science and Engineering, Lovely Professional University, Chaheru, Punjab, India 2 School of Computer Applications, Lovely Professional University, Chaheru, Punjab, India 3 School of Biosciences, Lovely Professional University, Chaheru, Punjab, India 123 Interdiscip Sci Comput Life Sci DOI 10.1007/s12539-016-0183-6