Improving FCM and T2FCM Algorithms Performance using GPUs for Medical Images Segmentation Mohammed A. Shehab, Mahmoud Al-Ayyoub and Yaser Jararweh Jordan University of Science and Technology Irbid, Jordan Emails: mashehab12@cit.just.edu.jo, {maalshbool, yijararweh}@just.edu.jo Abstract—Image segmentation gained popularity recently due to numerous applications in many fields such as computer vision, medical imaging. From its name, segmentation is interested in partitioning the image into separate regions where one of them is of special interest. Such region is called the Region of Interest (RoI) and it is very important for many medical imaging problems. Clustering is one of the segmentation approaches typically used on medical images despite its long running time. In this work, we propose to leverage the power of the Graphics Processing Unit (GPU)to improve the performance of such approaches. Specifically, we focus on the Fuzzy C-Means (FCM) algorithm and its more recent variation, the Type-2 Fuzzy C- Means (T2FCM) algorithm. We propose a hybrid CPU-GPU implementation to speed up the execution time without affecting the algorithm’s accuracy. The experiments show that such an approach reduces the execution time by up to 80% for FCM and 74% for T2FCM. Index Terms—Medical Image Segmentation; Fuzzy C-Means; Type-2 Fuzzy C-Means; GPU; CUDA I. I NTRODUCTION Recently, medical image processing (for the different ex- isting modalities such as magnetic resonance imaging (MRI), computed tomography (CT), digital mammography, etc.) has become more popular due to its obvious benefits in the diagnosis of many diseases. Researchers are continuously trying to come up with more accurate and efficient techniques [1]. However, due to the recent advances in medical image modalities and the increased size and resolution of medical images, the processing capabilities of typical CPUs are not longer suitable. A recent trend is to exploit the capabilities of Graphics Processing Unit (GPU) in order to improve the performance of medical image processing tasks [2], [3], [4]. Image segmentation is one of the fundamental tasks in image processing. It focuses on how to extract objects from images. It separates different regions of the image where one region is of special interest. Such region is called the Region of Interest (RoI) and it is very important for many medical imaging problems [5], [6]. For example, segmentation is an integral step in many Computer-Aided Diagnosis (CAD) systems [7], [8], [9], [10]. Many approaches were proposed for this task such as threshold-based methods, clustering methods, compression-based methods, histogram-based methods and region-growing methods [11], [12], [13], [14]. We focus here on the clustering techniques for segmentation. Specifically, we are concerned with the celebrated Fuzzy C-Means (FCM) technique [15]. Due to its importance, several enhancements of FCM ap- peared over the past three decades trying to improve the accu- racy and performance of FCM. For the latter objective, [16], [1], [17], [18] proposed to use GPU capabilities. GPUs use single instruction multiple data (SIMD) parallel programming. While both CPUs and GPUs can run and manage thousands of threads simultaneously via time-slicing, modern CPUs can run 4-12 threads in parallel, whereas GPUs can run a thousand threads at a time [1], [19]. In this work, we show how to improve the performance of FCM as well as a variation of it called Type-2 Fuzzy C-Means (T2FCM) using GPU. Following the finding of [1], we devise a hybrid CPU-GPU implementation and compare it with CPU implementation on two medical images. The structure of this paper is as follows. The following section briefly discuss a few similar works. Section III presents our methodology which involves discussing the sequential as well as the hybrid implementations and Section IV discusses the experiments we conducted and the results we obtained. Finally, we conclude our work and provide some directions for future researchers. II. RELATED WORKS Most research efforts focused on improving the accuracy of FCM with some researchers focusing on how to improve the performance of FCM. For instance, Rowi´ nska et al. [16] im- plemented FCM on a parallel architecture. They used CUDA to convert the sequential code of FCM to a parallel one. The testing data were composed of different colored images with different sizes. They transferred two main functions of FCM to be executed on GPU platform. The membership matrix and calculating new centroids were running on the GPU side while the objective function and the termination condition were running on the CPU side. Their experiments were conducted on an Intel Core i3 machine with NVIDIA GeForce GTX 560 video card and Windows 7 64-bit operating system. Their CUDA implementation was tested against two sequential implementations (in C++ and MATLAB). Two types of ex- periments were conducted with one-/two-dimensional feature spaces. The GPU implementation was 7 times faster than the 2015 6th International Conference on Information and Communication Systems (ICICS) 978-1-4799-7348-4/15/$31.00 ©2015 IEEE