Expert Systems With Applications 146 (2020) 113185 Contents lists available at ScienceDirect Expert Systems With Applications journal homepage: www.elsevier.com/locate/eswa An efficient binary social spider algorithm for feature selection problem Emine BA ¸ S a, , Erkan ÜLKER b a Kulu Vocational School, Selçuk University, 42075 Konya, Turkey b Department of Computer Engineering, Faculty of Engineering and Nature Sciences, Konya Technical University, 42075 Konya, Turkey a r t i c l e i n f o Article history: Received 8 July 2019 Revised 19 November 2019 Accepted 5 January 2020 Available online 7 January 2020 Keywords: Social spider algorithm Feature selection Classifiers a b s t r a c t The social spider algorithm (SSA) is a heuristic algorithm created on spider behaviors to solve contin- uous problems. In this paper, firstly a binary version of the social spider algorithm called binary social spider algorithm (BinSSA) is proposed. Currently, there is insufficient focus on the binary version of SSA in the literature. The main part of the binary version is the transfer function. The transfer function is responsible for mapping continuous search space to binary search space. In this study, eight of the trans- fer functions divided into two families, S-shaped and V-shaped, are evaluated. BinSSA is obtained from SSA, by transforming constant search space to binary search space with eight different transfer func- tions (S-Shapes and V-Shaped). Thus, eight different variations of BinSSA are formed as BinSSA1, BinSSA2, BinSSA3, BinSSA4, BinSSA5, BinSSA6, BinSSA7, and BinSSA8. For increasing, exploration and exploitation capacity of BinSSA, a crossover operator is added as BinSSA-CR. In secondly, the performances of BinSSA variations are tested on feature selection task. The optimal subset of features is a challenging problem in the process of feature selection. In this paper, according to different comparison criteria (mean of fitness values, the standard deviation of fitness values, the best of fitness values, the worst of fitness val- ues, accuracy values, the mean number of the selected features, CPU time), the best BinSSA variation is detected. In the feature selection problem, the K-nearest neighbor (K-NN) and support vector machines (SVM) are used as classifiers. A detailed study is performed for the fixed parameter values used in the fit- ness function. BinSSA is evaluated on low-scaled, middle-scaled and large-scaled twenty-one well-known UCI datasets and obtained results are compared with state-of-art algorithms in the literature. Obtained results have shown that BinSSA and BinSSA-CR show superior performance and offer quality and stable solutions. © 2020 Elsevier Ltd. All rights reserved. 1. Introduction Classification is one of the most common tasks in data min- ing (Yin & Gai, 2015). Because of the vast growth in the data in the world, a pre-processing procedure like feature selection be- comes a challenging and fundamental task in data mining appli- cations (Jensen, 2005). Feature selection means that the perfor- mance of the learning algorithm is improved by removing the re- peated features and irrelevant features from the data set (Liu & Motoda, 1998; Pal & Maiti, 2010). In feature selection, finding the optimal subset is a decisive issue. Exhaustive search can produce all possible subsets by examining all the entire set of features. This approach is impractical for the large datasets and has an extremely high computational cost because if a dataset holds M features, Corresponding author. E-mail addresses: emineozcan@selcuk.edu.tr (E. BA ¸ S), eulker@ktun.edu.tr (E. ÜLKER). then there are 2 M subsets of features (Guyon & Elisseeff, 2003; Yin, Gai & Wang, 2016). Feature selection is available in many dif- ferent areas. For example: text categorization (U˘ guz, 2011), face recognition (Kanan & Faez, 2008), cancer classification (Yu, Gu, Liu, Shen & Zhao, 2009), gene classification (Tabakhi, Najafi, Ranjbar & Moradi, 2015), finance (Huang & Tsai, 2009), recommender sys- tems (Ramezani, Moradi & Tab, 2013) and customer relationship management (Kuri-Morales & Rodríguez-Erazo, 2009) etc. Many feature selection algorithms include heuristic or random search strategies to find the most appropriate or optimal feature sub- set to reduce computation time. Feature selection methods can be categorized into filter (Ke, Feng & Ren, 2008; Sun, 2007; Yang & Honavar, 1998), wrapper (Abe, 2005; Chun-Nan, Hung-Ju & Diet- rich, 2002; Guan, Liu & Qi, 2004; Qiao, Peng & Peng, 2006), hy- brid (Huang, Cai & Xu, 2007; Kabir, Shahjahan & Murase, 2012; Sivagaminathan & Ramakrishnan, 2007) and embedded (Dash & Liu, 1997; Huan & Lei, 2005; Lai, Reinders & Wessels, 2006) ap- proaches. The filter approach is not based on any particular learn- https://doi.org/10.1016/j.eswa.2020.113185 0957-4174/© 2020 Elsevier Ltd. All rights reserved.