Expert Systems With Applications 146 (2020) 113185 Contents lists available at ScienceDirect Expert Systems With Applications journal homepage: www.elsevier.com/locate/eswa An eﬃcient binary social spider algorithm for feature selection problem Emine BA ¸ S a,∗ , Erkan ÜLKER b a Kulu Vocational School, Selçuk University, 42075 Konya, Turkey b Department of Computer Engineering, Faculty of Engineering and Nature Sciences, Konya Technical University, 42075 Konya, Turkey a r t i c l e i n f o Article history: Received 8 July 2019 Revised 19 November 2019 Accepted 5 January 2020 Available online 7 January 2020 Keywords: Social spider algorithm Feature selection Classiﬁers a b s t r a c t The social spider algorithm (SSA) is a heuristic algorithm created on spider behaviors to solve contin- uous problems. In this paper, ﬁrstly a binary version of the social spider algorithm called binary social spider algorithm (BinSSA) is proposed. Currently, there is insuﬃcient focus on the binary version of SSA in the literature. The main part of the binary version is the transfer function. The transfer function is responsible for mapping continuous search space to binary search space. In this study, eight of the trans- fer functions divided into two families, S-shaped and V-shaped, are evaluated. BinSSA is obtained from SSA, by transforming constant search space to binary search space with eight different transfer func- tions (S-Shapes and V-Shaped). Thus, eight different variations of BinSSA are formed as BinSSA1, BinSSA2, BinSSA3, BinSSA4, BinSSA5, BinSSA6, BinSSA7, and BinSSA8. For increasing, exploration and exploitation capacity of BinSSA, a crossover operator is added as BinSSA-CR. In secondly, the performances of BinSSA variations are tested on feature selection task. The optimal subset of features is a challenging problem in the process of feature selection. In this paper, according to different comparison criteria (mean of ﬁtness values, the standard deviation of ﬁtness values, the best of ﬁtness values, the worst of ﬁtness val- ues, accuracy values, the mean number of the selected features, CPU time), the best BinSSA variation is detected. In the feature selection problem, the K-nearest neighbor (K-NN) and support vector machines (SVM) are used as classiﬁers. A detailed study is performed for the ﬁxed parameter values used in the ﬁt- ness function. BinSSA is evaluated on low-scaled, middle-scaled and large-scaled twenty-one well-known UCI datasets and obtained results are compared with state-of-art algorithms in the literature. Obtained results have shown that BinSSA and BinSSA-CR show superior performance and offer quality and stable solutions. © 2020 Elsevier Ltd. All rights reserved. 1. Introduction Classiﬁcation is one of the most common tasks in data min- ing (Yin & Gai, 2015). Because of the vast growth in the data in the world, a pre-processing procedure like feature selection be- comes a challenging and fundamental task in data mining appli- cations (Jensen, 2005). Feature selection means that the perfor- mance of the learning algorithm is improved by removing the re- peated features and irrelevant features from the data set (Liu & Motoda, 1998; Pal & Maiti, 2010). In feature selection, ﬁnding the optimal subset is a decisive issue. Exhaustive search can produce all possible subsets by examining all the entire set of features. This approach is impractical for the large datasets and has an extremely high computational cost because if a dataset holds M features, ∗ Corresponding author. E-mail addresses: emineozcan@selcuk.edu.tr (E. BA ¸ S), eulker@ktun.edu.tr (E. ÜLKER). then there are 2 M subsets of features (Guyon & Elisseeff, 2003; Yin, Gai & Wang, 2016). Feature selection is available in many dif- ferent areas. For example: text categorization (U˘ guz, 2011), face recognition (Kanan & Faez, 2008), cancer classiﬁcation (Yu, Gu, Liu, Shen & Zhao, 2009), gene classiﬁcation (Tabakhi, Najaﬁ, Ranjbar & Moradi, 2015), ﬁnance (Huang & Tsai, 2009), recommender sys- tems (Ramezani, Moradi & Tab, 2013) and customer relationship management (Kuri-Morales & Rodríguez-Erazo, 2009) etc. Many feature selection algorithms include heuristic or random search strategies to ﬁnd the most appropriate or optimal feature sub- set to reduce computation time. Feature selection methods can be categorized into ﬁlter (Ke, Feng & Ren, 2008; Sun, 2007; Yang & Honavar, 1998), wrapper (Abe, 2005; Chun-Nan, Hung-Ju & Diet- rich, 2002; Guan, Liu & Qi, 2004; Qiao, Peng & Peng, 2006), hy- brid (Huang, Cai & Xu, 2007; Kabir, Shahjahan & Murase, 2012; Sivagaminathan & Ramakrishnan, 2007) and embedded (Dash & Liu, 1997; Huan & Lei, 2005; Lai, Reinders & Wessels, 2006) ap- proaches. The ﬁlter approach is not based on any particular learn- https://doi.org/10.1016/j.eswa.2020.113185 0957-4174/© 2020 Elsevier Ltd. All rights reserved.