Clustering-based approach for detecting breast cancer recurrence Smaranda Belciug Department of Computer Science Faculty of Mathematics and Computer Science, University of Craiova, Romania smaranda.belciug@inf.ucv.ro Abdel-Badeeh Salem Department of Computer Science, Faculty of Computer and Information sciences, Ain Shams University, Cairo Egypt absalem@asunet.shams.edu.eg Florin Gorunescu Chair of Mathematics, Biostatistics and Informatics University of Medicine and Pharmacy of Craiova Romania fgorun@rdslink.ro Marina Gorunescu Department of Computer Science Faculty of Mathematics and Computer Science, University of Craiova, Romania mgorun@inf.ucv.ro Abstract—This paper aims to assess the effectiveness of three different clustering algorithms, used to detect breast cancer recurrent events. The performance of a classical k-means algorithm is compared with a much more sophisticated Self- Organizing Map (SOM-Kohonen network) and a cluster network, closely related to both k-means and SOM. The three clustering algorithms have been applied on a concrete breast cancer dataset, and the result clearly showed that the best performance was obtained by the cluster network, followed by SOM and k-means, their predicting accuracy ranging from 62% to 78%. Based on the patients’ segmentation regarding the occurrence of recurrent events, new patients may be labeled according to their medical characteristics as developing or not recurrent events, thus supporting health professionals in making informed decisions. Keywords – clustering algorithms, k-means, self-organizing map network, cluster network, breast cancer recurrence I. INTRODUCTION Breast cancer is one of the most common cancers among women, and the second leading cause of cancer deaths today. According to the American Cancer Society, the chance of a woman having invasive breast cancer some time during her life is a little less 1 in 8, and the chance of dying from breast cancer is about 1 in 35. Fortunately, the death rates have been going down recently, due to the fact that the disease is caught earlier and the body responds better to newer treatment. Once the treatment is completed, the cancer may come back. This fact is more devastating or psychologically more difficult for a patient rather than its initial diagnosis. Accordingly, it seems that a medical diagnosis of recurrent cancer is often more challenging task than the initial one. Currently, breast cancer detection and recurrence diagnosis is achieved using the conventional imaging (CI), or the more complex and much more expensive nuclear imaging, such as MRI (magnetic resonance imaging), PET (positron emission tomography), etc. Let us point out that the average accuracy of using such modern, but very expensive at the same time, medical imaging methods for detecting breast cancer recurrent events is about 80% according to [1], [2]. Hence, developing intelligent systems to predict breast cancer recurrent events would assist health professionals in making informed decision, very fast and with low costs computer technologies. Recent breast cancer research using machine learning techniques mainly focused on classification and clustering methods. Concerning the classification methods, most often decision trees and neural networks have been used. Thus, fuzzy decision trees have been used for predicting breast cancer survivability [3], identification and extraction of important patterns of non-compliance with guidelines, using decision tree and cause analysis [4], digital thermographs in breast cancer using decision trees to generate the knowledge-based diagnostic rules [5]. Neural networks have been used to solve different aspects in breast cancer, such as: detection of breast cancer by radiographic features and patient age with NN using evolutionary programming [6], differentiating between benign and malign breast cancer tumors using probabilistic neural networks [7] and different supervised and unsupervised neural networks [8], evaluation of the (3D) power Doppler ultrasound in the differential diagnosis of solid breast tumors [9], building a medical decision support system for breast cancer based on a hybrid model containing rough sets and probabilistic neural networks [10], evaluating the effectiveness of different neural networks applied to detect recurrent breast cancer events [11], determination of the breast tumors type by vascularity indices (harmonic and non-harmonic 3D power Doppler imaging) using MLP [12]. Clustering methodology targeted different aspects, such as: breast segmentation into different regions, in order to identify abnormal tissue regions, using standard fuzzy clustering [13], hierarchical clustering for melanoma and breast cancer [14], identifying core classes in breast cancer using consensus across several different clustering algorithms [15]. In this general context, the novelty of this paper consists in assessing the performance of three clustering- 533 978-1-4244-8136-1/10/$26.00 c 2010 IEEE