Informatics in Medicine Unlocked 24 (2021) 100572
Available online 14 April 2021
2352-9148/© 2021 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
Memory based cuckoo search algorithm for feature selection of gene
expression dataset
Malek Alzaqebah, PhD
a, b
, Khaoula Briki
a, b
, Nashat Alrefai
a, b
, Sami Brini
a, b
, Sana Jawarneh
c
,
Mutasem K. Alsmadi
d, *
, Rami Mustafa A. Mohammad
e
, Ibrahim ALmarashdeh
d
,
Fahad A. Alghamdi
d
, Nahier Aldhafferi
f
, Abdullah Alqahtani
f
a
Department of Mathematics, College of Science, Imam Abdulrahman Bin Faisal University, P.O. Box 1982, 31441, City of Dammam, Saudi Arabia
b
Basic & Applied Scientifc Research Center, Imam Abdulrahman Bin Faisal University, P.O. Box 1982, 31441, Dammam, Saudi Arabia
c
Department of Computer Science, Community College, Imam Abdulrahman Bin Faisal University, P.O. Box 1982, 31441, City of Dammam, Saudi Arabia
d
Department of MIS, College of Applied Studies and Community Service, Imam Abdulrahman Bin Faisal University, P.O. Box 1982, 31441, City of Dammam, Saudi
Arabia
e
Computer Information Systems Department, College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, P.O. Box 1982,
Dammam, Saudi Arabia
f
Department of Computer Information Systems, College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, P.O. Box 1982,
31441, City of Dammam, Saudi Arabia
A R T I C L E INFO
Keywords:
Cuckoo-search-algorithm
Feature-selection
Classifcation
Microarray
Cancer-prediction and memory-based-methods
ABSTRACT
Cancer prediction has been shown to be important in the cancer research area. This importance has prompted
many researchers to review machine learning-approaches to predict cancer outcome using gene expression
dataset. This dataset consists of many genes (features) which can mislead the prediction ability of the machine
learning methods, as some features may lead to confusion or inaccurate classifcation. Since fnding the most
informative genes for cancer prediction is challenging, feature selection techniques are recommended to pick
important and relevant features out of large and complex datasets. In this research, we propose the Cuckoo
search method as a feature selection algorithm, guided by the memory-based mechanism to save the most
informative features that are identifed by the best solutions. The purpose of the memory is to keep track of the
selected features at every iteration and fnd the features that enhance classifcation accuracy. The suggested
algorithm has been contrasted with the original algorithm using microarray datasets and the proposed algorithm
has been shown to produce good results as compared to original and contemporary algorithms.
1. Introduction
Recently, feature selection, has become an attractive cancer research
feld. Feature selection is recognized as an NP-hard issue [1,2]. The
complexity of the problem arises in selecting the most informative fea-
tures that will help the prediction methods to classify with high accuracy
the data with a minimum number of features and a satisfactory
performance.
Cancer is a disease that occurs when one or more cells start to un-
dergo mutation. This can happen during cell growth when the cell starts
to react in abnormal ways, such as replicating itself uncontrollably. With
a series of mutations, the cancer cell might spread to other body parts
and thus infect other cells as well.
Today, cancer classifcation has used advanced techniques such as
microarray technology to conduct research. Microarray data can enable
measuring thousands of genes simultaneously, with the genes expression
dataset as an output [3]. This technology also has successfully been
applied in many problems and has achieved a superior result compared
with other techniques, particularly in the medical feld. Microarray also
has shown the ability to diagnose patients who have a specifc disease.
Thus, this technology is used to detect diseases such as cancer. The most
important weakness of microarray dataset is the large dimensionality
and the complex interrelationship between features [4]. To solve these
issues irrelevant genes should be eliminated and the dimensionality of
* Corresponding author.
E-mail addresses: maafehaid@iau.edu.sa (M. Alzaqebah), kabriki@iau.edu.sa (K. Briki), nalrefai@iau.edu.sa (N. Alrefai), ssbrini@iau.edu.sa (S. Brini),
sijawarneh@iau.edu.sa (S. Jawarneh), mkalsmadi@iau.edu.sa (M.K. Alsmadi).
Contents lists available at ScienceDirect
Informatics in Medicine Unlocked
journal homepage: http://www.elsevier.com/locate/imu
https://doi.org/10.1016/j.imu.2021.100572
Received 1 February 2021; Received in revised form 29 March 2021; Accepted 31 March 2021