Received: 26 December 2018 Revised: 14 June 2019 Accepted: 12 August 2019 DOI: 10.1002/cpe.5538 SPECIAL ISSUE PAPER Improving classification and clustering techniques using GPUs Yaser Jararweh Mohammed A. Shehab Qussai Yaseen Mahmoud Al-Ayyoub Jordan University of Science and Technology, Irbid, Jordan Correspondence Qussai Yaseen, Jordan University of Science and Technology, Irbid 22110, Jordan. Email: qmyaseen@just.edu.jo Summary Classification and clustering techniques are used in different applications. Large-scale big data applications such as social networks analysis applications need to process large data chunks in a short time. Classification and clustering tasks in such applications consume a lot of processing time. Improving the performance of classification and clustering algorithms enhances the performance of applications that use such type of algorithms. This paper introduces an approach for exploiting the graphics processing unit (GPU) platform to improve the performance of classification and clustering algorithms. The proposed approach uses two GPUs implementations, which are the pure GPU or GPU-only implementation and the GPU-CPU hybrid implementation. The results show that the hybrid implementation, which optimizes the subtask scheduling for both the CPU and the GPU processing elements, outperforms the approach that uses only the GPU. KEYWORDS classification and clustering algorithms, GPU-CPU hybrid implementation, graphics processing unit, social networks analysis 1 INTRODUCTION The immense growth of networking and internet infrastructure and technologies helped new technologies such as Internet of Things (IoT), 1 Cloud Computing, 2 Machine learning, and other fields in information technology 3 to flourish and prosper. These technologies shaped a new business era, and their applications enhanced the quality of life. However, the huge size of data that such technologies produce is considered a challenge since processing and analyzing big data requires powerful resources. 4 Many techniques are used to segment data into groups based on their identical attributes. Clustering algorithms are used to analyze gigantic datasets that are produced via modern applications. 5 Furthermore, clustering methods can identify abnormal events or data, which may lead to discover problems and study their causes and innovate solutions. 6,7 There are many efficient clustering algorithms. The K-Means (KM) and Fuzzy C-Means (FCM) are two common clustering algorithms. 8-10 They can work on different types of data, eg, 2D/3D image segmentation, 11,12 community detection in social networks, 13 clustering for gene-expression, 14 and textual data. 15 The execution time for the aforementioned algorithms is a critical issue. Clearly, increasing the data sizes and the dimensions of data attributes increases the execution time directly. Therefore, some methods should be used to mitigate the effect of data sizes. For example, parallel computing is used to reduce the effect of data size by utilizing the multicore environment. 16 . The modern central processing unit (CPU) contains around 32 cores, 17 while the modern graphics processing unit (GPU) has around 4999 cores. 18 Obviously, the GPU architecture is more suitable for parallel computing than CPU. Therefore, developers employ the capabilities of GPUs in parallel computing, and some of them recommend the collaborative use of CPU and GPU (ie, hybrid version implementations). 19 The paper is organized as the follows. The next section introduces the related work. Section 3 presents and discusses the proposed methodology. Section 4 provides and analyzes the experiments and results. Section 5 summarizes the work and presents the future work. Concurrency Computat Pract Exper. 2019;e5538. wileyonlinelibrary.com/journal/cpe © 2019 John Wiley & Sons, Ltd. 1 of 10 https://doi.org/10.1002/cpe.5538