Applied Soft Computing Journal 93 (2020) 106402
Contents lists available at ScienceDirect
Applied Soft Computing Journal
journal homepage: www.elsevier.com/locate/asoc
Feature selection based on improved binary global harmony search for
data classification
Jafar Gholami
a
, Farhad Pourpanah
b
, Xizhao Wang
c ,*
a
Department of Computer Engineering, Kermanshah Science and Research Branch, Islamic Azad University, Kermanshah, Iran
b
College of Mathematics and Statistics, Shenzhen University, China
c
College of Computer Science and Software Engineering, Guangdong Key Lab. of Intelligent Information Processing, Shenzhen University, China
article info
Article history:
Received 16 February 2020
Received in revised form 20 April 2020
Accepted 12 May 2020
Available online 22 May 2020
Keywords:
Feature selection
Population-based optimization
Binary harmony search
Data classification
abstract
Harmony search (HS) is an effective meta-heuristic algorithm inspired by the music improvisation
process, where musicians search for a pleasing harmony by adjusting their instruments’ pitches. The
HS algorithm and its variants have been widely used to solve binary and continuous optimization
problems. In this paper, we propose an improved binary global harmony search algorithm, called
IBGHS, to undertake feature selection problems. A modified improvisation step is introduced to
enhance the global search ability and increase the convergence speed of the algorithm. In addition,
the K -nearest neighbor (KNN) is used as an underlying learning model to evaluate the effectiveness of
the selected feature subsets. The experimental results on eighteen benchmark problems indicate that
the proposed IBGHS algorithm is able to produce comparable results as compared with other state-of-
the-art population-based methods such as genetic algorithm (GA), particle swarm optimization (PSO),
antlion optimizer (ALO), novel global harmony search (NGHS) and whale optimization algorithm (WOA)
in solving feature selection problems.
© 2020 Elsevier B.V. All rights reserved.
1. Introduction
Feature selection (FS) is an optimization problem that plays an
important role in tackling classification problems. It is a process
of selecting an optimal subset of features from a data set so
that the classifier can obtain better accuracy and/or reduce the
computational burden. Nonetheless, removing irrelevant features
is a challenging issue and time consuming owing to a large search
space and wrapped relationship between the features [1–4].
FS techniques can be grouped into filter-, embedded- and
wrapper-based methods [5]. The filter-based methods use the
properties of the learning samples, such as distance and similar-
ity, during the FS process [6]. Embedded-based methods search
for the best feature subset during the training process, in order
to reduce the computational burden [7]. While, wrapper-based
methods use a classification algorithm to evaluate the quality of
the various feature subsets, and a search mechanism to find the
optimal ones. Among them, wrapper-based methods are more
effective since they use a classifier to operate as a feedback
mechanism to compute the fitness value of the selected feature
subsets, but they are computationally expensive [8].
*
Corresponding author.
E-mail addresses: gholami2018@gmail.com (J. Gholami), farhad@szu.edu.cn
(F. Pourpanah), xizhaowang@ieee.org (X. Wang).
Traditional wrapper-based FS methods, such as sequential
backward selection (SBS) [9] and sequential forward selection
(SFS) [10], improve the performance of the learning model via
sequentially adding or removing features from data set. In these
methods, once features are removed or added, they cannot be
updated in the next steps. Later, this problem was solved by inte-
grating a floating technique into SBS and SFS [11]. However, they
suffer from nesting effects and computationally expensive [12].
To alleviate these problems, population-based optimization al-
gorithms, such as particle swarm optimization (PSO) [13–15],
genetic algorithm (GA) [16–18], genetic programming (GP) [19,
20], ant colony optimization (ACO) [21], brain storm optimization
(BSO) [22,23] and harmony search (HS) [24], have been used.
These algorithms start with a set of randomly generated solu-
tions, and use a fitness function to evaluate them. Then, they
generate new solutions based on the individuals that performed
better in the previous iteration. As a result, these algorithms
reduce the computational burden as they avoid generating new
individuals similar to the low quality ones.
Among them, harmony search (HS) [25] is an effective meta-
heuristic algorithm inspired by the music improvisation process
of probing for a better state of harmony. HS has been widely ap-
plied to solve real-world optimization problems, such as control
system [26] and financial management [27], due to its simple
structure, easy to implement and less parameters [28]. However,
the basic HS algorithm suffers from several limitations such as
https://doi.org/10.1016/j.asoc.2020.106402
1568-4946/© 2020 Elsevier B.V. All rights reserved.