Informatics in Medicine Unlocked 24 (2021) 100572 Available online 14 April 2021 2352-9148/© 2021 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). Memory based cuckoo search algorithm for feature selection of gene expression dataset Malek Alzaqebah, PhD a, b , Khaoula Briki a, b , Nashat Alrefai a, b , Sami Brini a, b , Sana Jawarneh c , Mutasem K. Alsmadi d, * , Rami Mustafa A. Mohammad e , Ibrahim ALmarashdeh d , Fahad A. Alghamdi d , Nahier Aldhafferi f , Abdullah Alqahtani f a Department of Mathematics, College of Science, Imam Abdulrahman Bin Faisal University, P.O. Box 1982, 31441, City of Dammam, Saudi Arabia b Basic & Applied Scientifc Research Center, Imam Abdulrahman Bin Faisal University, P.O. Box 1982, 31441, Dammam, Saudi Arabia c Department of Computer Science, Community College, Imam Abdulrahman Bin Faisal University, P.O. Box 1982, 31441, City of Dammam, Saudi Arabia d Department of MIS, College of Applied Studies and Community Service, Imam Abdulrahman Bin Faisal University, P.O. Box 1982, 31441, City of Dammam, Saudi Arabia e Computer Information Systems Department, College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, P.O. Box 1982, Dammam, Saudi Arabia f Department of Computer Information Systems, College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, P.O. Box 1982, 31441, City of Dammam, Saudi Arabia A R T I C L E INFO Keywords: Cuckoo-search-algorithm Feature-selection Classifcation Microarray Cancer-prediction and memory-based-methods ABSTRACT Cancer prediction has been shown to be important in the cancer research area. This importance has prompted many researchers to review machine learning-approaches to predict cancer outcome using gene expression dataset. This dataset consists of many genes (features) which can mislead the prediction ability of the machine learning methods, as some features may lead to confusion or inaccurate classifcation. Since fnding the most informative genes for cancer prediction is challenging, feature selection techniques are recommended to pick important and relevant features out of large and complex datasets. In this research, we propose the Cuckoo search method as a feature selection algorithm, guided by the memory-based mechanism to save the most informative features that are identifed by the best solutions. The purpose of the memory is to keep track of the selected features at every iteration and fnd the features that enhance classifcation accuracy. The suggested algorithm has been contrasted with the original algorithm using microarray datasets and the proposed algorithm has been shown to produce good results as compared to original and contemporary algorithms. 1. Introduction Recently, feature selection, has become an attractive cancer research feld. Feature selection is recognized as an NP-hard issue [1,2]. The complexity of the problem arises in selecting the most informative fea- tures that will help the prediction methods to classify with high accuracy the data with a minimum number of features and a satisfactory performance. Cancer is a disease that occurs when one or more cells start to un- dergo mutation. This can happen during cell growth when the cell starts to react in abnormal ways, such as replicating itself uncontrollably. With a series of mutations, the cancer cell might spread to other body parts and thus infect other cells as well. Today, cancer classifcation has used advanced techniques such as microarray technology to conduct research. Microarray data can enable measuring thousands of genes simultaneously, with the genes expression dataset as an output [3]. This technology also has successfully been applied in many problems and has achieved a superior result compared with other techniques, particularly in the medical feld. Microarray also has shown the ability to diagnose patients who have a specifc disease. Thus, this technology is used to detect diseases such as cancer. The most important weakness of microarray dataset is the large dimensionality and the complex interrelationship between features [4]. To solve these issues irrelevant genes should be eliminated and the dimensionality of * Corresponding author. E-mail addresses: maafehaid@iau.edu.sa (M. Alzaqebah), kabriki@iau.edu.sa (K. Briki), nalrefai@iau.edu.sa (N. Alrefai), ssbrini@iau.edu.sa (S. Brini), sijawarneh@iau.edu.sa (S. Jawarneh), mkalsmadi@iau.edu.sa (M.K. Alsmadi). Contents lists available at ScienceDirect Informatics in Medicine Unlocked journal homepage: http://www.elsevier.com/locate/imu https://doi.org/10.1016/j.imu.2021.100572 Received 1 February 2021; Received in revised form 29 March 2021; Accepted 31 March 2021