Indonesian Journal of Electrical Engineering and Computer Science Vol. 32, No. 2, November 2023, pp. 1150~1158 ISSN: 2502-4752, DOI: 10.11591/ijeecs.v32.i2.pp1150-1158 1150 Journal homepage: http://ijeecs.iaescore.com Clustering performance using k-modes with modified entropy measure for breast cancer Nurshazwani Muhamad Mahfuz 1,2 , Heru Suhartanto 3 , Kusmardi Kusmardi 4,5 , Marina Yusoff 1,2 1 College of Computing, Informatic and Media, Kompleks Al-Khawarizmi, Universiti Teknologi MARA (UiTM), Shah Alam, Malaysia 2 Institute for Big Data Analytics and Artificial Intelligence (IBDAAI), Kompleks Al-Khawarizmi, Universiti Teknologi MARA (UiTM), Shah Alam, Malaysia 3 Faculty of Computer Science, Universitas Indonesia, Depok, Indonesia 4 Department of Anatomical Pathology, Faculty of Medicine, Universitas Indonesia/Cipto Mangunkusumo Hospital, Jakarta, Indonesia 5 Human Cancer Research Cluster, Indonesia Medical Education, and Research Institute, Universitas Indonesia, Jakarta, Indonesia Article Info ABSTRACT Article history: Received Jul 10, 2023 Revised Jul 14, 2023 Accepted Aug 10, 2023 Breast cancer is a serious disease that requires data analysis for diagnosis and treatment. Clustering is a data mining technique that is often used in breast cancer research to assess the level of malignancy at an early stage. However, clustering categorical data can be challenging because different levels in categorical variables can impact the clustering process. This research proposes a modified entropy measure (MEM) to enhance clustering performance. MEM aims to address the issue of distance-based measures in clustering categorical data. It is also a useful tool for assessing data loss in categorical clustering, which helps to understand the patterns and relationships by quantifying the information lost during clustering. An evaluation compares k-modes+MEM, k-means+MEM, DBSCAN+MEM, and affinity+MEM with conventional clustering algorithms. The assessment metrics of clustering accuracy, intra- cluster distance and fowlkes-mallow index (FMI) are employed to evaluate the algorithm performance. Experimental results show significant improvements. k-modes+MEM algorithm achieves a reduction in average intra-cluster distance and outperforms other algorithms in accuracy, intra- cluster distance, and FMI. The proposed algorithm can be extended to heterogeneous datasets in various domains such as healthcare, finance, and marketing. Keywords: Categorical data Clustering Distance metric Entropy measure Evaluation performance This is an open access article under the CC BY-SA license. Corresponding Author: Marina Yusoff Institute for Big Data Analytics and Artificial Intelligence (IBDAAI), Kompleks Al-Khawarizmi Universiti Teknologi MARA Shah Alam, Selangor, Malaysia Email: marina998@uitm.edu.my 1. INTRODUCTION Cancer is a major contributor to global mortality, and breast cancer is a significant contributor, ranking as the second leading cause of cancer-related deaths among women [1][3]. Breast cancer is a prevalent and potentially life-threatening disease that requires accurate diagnosis and effective treatment strategies [4]. Data analysis techniques have advanced significantly in recent years which offers valuable insights into complex datasets [5]. Clustering is one of these techniques that has gained prominence in breast cancer research for its ability to uncover distinct patterns and relationships within the data, thereby aiding in improved decision- making and patient care.