International Journal of Electrical and Computer Engineering (IJECE) Vol. 14, No. 2, April 2024, pp. 2068~2075 ISSN: 2088-8708, DOI: 10.11591/ijece.v14i2.pp2068-2075  2068 Journal homepage: http://ijece.iaescore.com An overlapping conscious relief-based feature subset selection method Nishat Tasnim Mim 1 , Md. Eusha Kadir 2 , Suravi Akhter 3 , Muhammad Asif Hossain Khan 4 1 Institute of Information Technology, University of Dhaka, Dhaka, Bangladesh 2 Institute of Information Technology, Noakhali Science and Technology University, Noakhali, Bangladesh 3 Department of Computer Science and Engineering, University of Liberal Arts Bangladesh, Dhaka, Bangladesh 4 Department of Computer Science and Engineering, Faculty of Engineering and Technology, University of Dhaka, Dhaka, Bangladesh Article Info ABSTRACT Article history: Received Apr 21, 2023 Revised Sep 6, 2023 Accepted Nov 4, 2023 Feature selection is considered as a fundamental prepossessing step in various data mining and machine learning based works. The quality of features is essential to achieve good classification performance and to have better data analysis experience. Among several feature selection methods, distance-based methods are gaining popularity because of their eligibility in capturing feature interdependency and relevancy with the endpoints. However, most of the distance-based methods only rank the features and ignore the class overlapping issues. Features with class overlapping data work as an obstacle during classification. Therefore, the objective of this research work is to propose a method named overlapping conscious MultiSURF (OMsurf) to handle data overlapping and select a subset of informative features discarding the noisy ones. Experimental results over 20 benchmark dataset demonstrates the superiority of OMsurf over six existing state-of-the-art methods. Keywords: Class overlapping Distance-based method Feature selection Relief-based method Reward-penalty This is an open access article under the CC BY-SA license. Corresponding Author: Muhammad Asif Hossain Khan Department of Computer Science and Engineering, Faculty of Engineering and Technology, University of Dhaka Nilkhet Road, Dhaka 1000, Bangladesh Email: asif@du.ac.bd 1. INTRODUCTION We are now living in the age of modern technologies. The rapid growth and wide use of technologies generate a huge amount of data which imposes a challenge for the data scientists and engineers to manage these data in an effective and efficient way. Data engineers usually find patterns and relationships after analyzing the data with the assistance of data mining and machine learning techniques. However, an issue named curse of dimensionality can be found in high-dimensional data which may mislead the learning phase of machine learning techniques. To achieve the desired performance in a variety of application domains (e.g., bioinformatics, life science, health care, and cyber security), the high-dimensional data often need to be pre-processed before applying machine learning techniques [1]–[4]. One of such data pre-processing techniques is feature selection. Feature selection is the process of identifying informative features and eliminating the noisy and un-useful features from the original feature set. Over the years, several feature selection methods have been proposed which can be broadly grouped into two types namely wrapper and filter method. The wrapper methods search for an optimal feature set based on a specific machine learning technique [5]–[7]. And thus, this feature selection procedure is highly dependent on the nature of the machine learning technique. On the contrary, filter methods use statistical approaches to measure the dependency between feature and target, and