Received: 26 December 2018 Revised: 14 June 2019 Accepted: 12 August 2019
DOI: 10.1002/cpe.5538
SPECIAL ISSUE PAPER
Improving classification and clustering techniques using GPUs
Yaser Jararweh Mohammed A. Shehab Qussai Yaseen Mahmoud Al-Ayyoub
Jordan University of Science and Technology,
Irbid, Jordan
Correspondence
Qussai Yaseen, Jordan University of Science
and Technology, Irbid 22110, Jordan.
Email: qmyaseen@just.edu.jo
Summary
Classification and clustering techniques are used in different applications. Large-scale big data
applications such as social networks analysis applications need to process large data chunks in
a short time. Classification and clustering tasks in such applications consume a lot of processing
time. Improving the performance of classification and clustering algorithms enhances the
performance of applications that use such type of algorithms. This paper introduces an approach
for exploiting the graphics processing unit (GPU) platform to improve the performance of
classification and clustering algorithms. The proposed approach uses two GPUs implementations,
which are the pure GPU or GPU-only implementation and the GPU-CPU hybrid implementation.
The results show that the hybrid implementation, which optimizes the subtask scheduling for
both the CPU and the GPU processing elements, outperforms the approach that uses only
the GPU.
KEYWORDS
classification and clustering algorithms, GPU-CPU hybrid implementation, graphics processing
unit, social networks analysis
1 INTRODUCTION
The immense growth of networking and internet infrastructure and technologies helped new technologies such as Internet of Things (IoT),
1
Cloud
Computing,
2
Machine learning, and other fields in information technology
3
to flourish and prosper. These technologies shaped a new business
era, and their applications enhanced the quality of life. However, the huge size of data that such technologies produce is considered a challenge
since processing and analyzing big data requires powerful resources.
4
Many techniques are used to segment data into groups based on their identical attributes. Clustering algorithms are used to analyze gigantic
datasets that are produced via modern applications.
5
Furthermore, clustering methods can identify abnormal events or data, which may lead to
discover problems and study their causes and innovate solutions.
6,7
There are many efficient clustering algorithms. The K-Means (KM) and Fuzzy C-Means (FCM) are two common clustering algorithms.
8-10
They can work on different types of data, eg, 2D/3D image segmentation,
11,12
community detection in social networks,
13
clustering for
gene-expression,
14
and textual data.
15
The execution time for the aforementioned algorithms is a critical issue. Clearly, increasing the data sizes and the dimensions of data attributes
increases the execution time directly. Therefore, some methods should be used to mitigate the effect of data sizes. For example, parallel
computing is used to reduce the effect of data size by utilizing the multicore environment.
16
.
The modern central processing unit (CPU) contains around 32 cores,
17
while the modern graphics processing unit (GPU) has around 4999
cores.
18
Obviously, the GPU architecture is more suitable for parallel computing than CPU. Therefore, developers employ the capabilities of
GPUs in parallel computing, and some of them recommend the collaborative use of CPU and GPU (ie, hybrid version implementations).
19
The paper is organized as the follows. The next section introduces the related work. Section 3 presents and discusses the proposed
methodology. Section 4 provides and analyzes the experiments and results. Section 5 summarizes the work and presents the future work.
Concurrency Computat Pract Exper. 2019;e5538. wileyonlinelibrary.com/journal/cpe © 2019 John Wiley & Sons, Ltd. 1 of 10
https://doi.org/10.1002/cpe.5538