Expert Systems With Applications 146 (2020) 113185
Contents lists available at ScienceDirect
Expert Systems With Applications
journal homepage: www.elsevier.com/locate/eswa
An efficient binary social spider algorithm for feature selection
problem
Emine BA ¸ S
a,∗
, Erkan ÜLKER
b
a
Kulu Vocational School, Selçuk University, 42075 Konya, Turkey
b
Department of Computer Engineering, Faculty of Engineering and Nature Sciences, Konya Technical University, 42075 Konya, Turkey
a r t i c l e i n f o
Article history:
Received 8 July 2019
Revised 19 November 2019
Accepted 5 January 2020
Available online 7 January 2020
Keywords:
Social spider algorithm
Feature selection
Classifiers
a b s t r a c t
The social spider algorithm (SSA) is a heuristic algorithm created on spider behaviors to solve contin-
uous problems. In this paper, firstly a binary version of the social spider algorithm called binary social
spider algorithm (BinSSA) is proposed. Currently, there is insufficient focus on the binary version of SSA
in the literature. The main part of the binary version is the transfer function. The transfer function is
responsible for mapping continuous search space to binary search space. In this study, eight of the trans-
fer functions divided into two families, S-shaped and V-shaped, are evaluated. BinSSA is obtained from
SSA, by transforming constant search space to binary search space with eight different transfer func-
tions (S-Shapes and V-Shaped). Thus, eight different variations of BinSSA are formed as BinSSA1, BinSSA2,
BinSSA3, BinSSA4, BinSSA5, BinSSA6, BinSSA7, and BinSSA8. For increasing, exploration and exploitation
capacity of BinSSA, a crossover operator is added as BinSSA-CR. In secondly, the performances of BinSSA
variations are tested on feature selection task. The optimal subset of features is a challenging problem
in the process of feature selection. In this paper, according to different comparison criteria (mean of
fitness values, the standard deviation of fitness values, the best of fitness values, the worst of fitness val-
ues, accuracy values, the mean number of the selected features, CPU time), the best BinSSA variation is
detected. In the feature selection problem, the K-nearest neighbor (K-NN) and support vector machines
(SVM) are used as classifiers. A detailed study is performed for the fixed parameter values used in the fit-
ness function. BinSSA is evaluated on low-scaled, middle-scaled and large-scaled twenty-one well-known
UCI datasets and obtained results are compared with state-of-art algorithms in the literature. Obtained
results have shown that BinSSA and BinSSA-CR show superior performance and offer quality and stable
solutions.
© 2020 Elsevier Ltd. All rights reserved.
1. Introduction
Classification is one of the most common tasks in data min-
ing (Yin & Gai, 2015). Because of the vast growth in the data in
the world, a pre-processing procedure like feature selection be-
comes a challenging and fundamental task in data mining appli-
cations (Jensen, 2005). Feature selection means that the perfor-
mance of the learning algorithm is improved by removing the re-
peated features and irrelevant features from the data set (Liu &
Motoda, 1998; Pal & Maiti, 2010). In feature selection, finding the
optimal subset is a decisive issue. Exhaustive search can produce
all possible subsets by examining all the entire set of features. This
approach is impractical for the large datasets and has an extremely
high computational cost because if a dataset holds M features,
∗
Corresponding author.
E-mail addresses: emineozcan@selcuk.edu.tr (E. BA ¸ S), eulker@ktun.edu.tr (E.
ÜLKER).
then there are 2
M
subsets of features (Guyon & Elisseeff, 2003;
Yin, Gai & Wang, 2016). Feature selection is available in many dif-
ferent areas. For example: text categorization (U˘ guz, 2011), face
recognition (Kanan & Faez, 2008), cancer classification (Yu, Gu, Liu,
Shen & Zhao, 2009), gene classification (Tabakhi, Najafi, Ranjbar &
Moradi, 2015), finance (Huang & Tsai, 2009), recommender sys-
tems (Ramezani, Moradi & Tab, 2013) and customer relationship
management (Kuri-Morales & Rodríguez-Erazo, 2009) etc. Many
feature selection algorithms include heuristic or random search
strategies to find the most appropriate or optimal feature sub-
set to reduce computation time. Feature selection methods can be
categorized into filter (Ke, Feng & Ren, 2008; Sun, 2007; Yang &
Honavar, 1998), wrapper (Abe, 2005; Chun-Nan, Hung-Ju & Diet-
rich, 2002; Guan, Liu & Qi, 2004; Qiao, Peng & Peng, 2006), hy-
brid (Huang, Cai & Xu, 2007; Kabir, Shahjahan & Murase, 2012;
Sivagaminathan & Ramakrishnan, 2007) and embedded (Dash &
Liu, 1997; Huan & Lei, 2005; Lai, Reinders & Wessels, 2006) ap-
proaches. The filter approach is not based on any particular learn-
https://doi.org/10.1016/j.eswa.2020.113185
0957-4174/© 2020 Elsevier Ltd. All rights reserved.