Citation: Awad, F.H.; Hamad, M.M.
Improved k-Means Clustering
Algorithm for Big Data Based on
Distributed SmartphoneNeural
Engine Processor. Electronics 2022, 11,
883. https://doi.org/10.3390/
electronics11060883
Academic Editor: Miin-shen Yang
Received: 25 February 2022
Accepted: 9 March 2022
Published: 11 March 2022
Publisher’s Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2022 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
electronics
Article
Improved k-Means Clustering Algorithm for Big Data Based on
Distributed SmartphoneNeural Engine Processor
Fouad H. Awad * and Murtadha M. Hamad
College of Computer Science and Information Technology, University of Anbar, Ramadi 31001, Iraq;
dr.mortadha61@uoanbar.edu.iq
* Correspondence: fouad.hammadi@uoanbar.edu.iq
Abstract: Clustering is one of the most significant applications in the big data field. However, using
the clustering technique with big data requires an ample amount of processing power and resources
due to the complexity and resulting increment in the clustering time. Therefore, many techniques
have been implemented to improve the performance of the clustering algorithms, especially for k-
means clustering. In this paper, the neural-processor-based k-means clustering technique is proposed
to cluster big data by accumulating the advantage of dedicated machine learning processors of mobile
devices. The solution was designed to be run with a single-instruction machine processor that exists
in the mobile device’s processor. Running the k-means clustering in a distributed scheme run based
on mobile machine learning efficiently can handle the big data clustering over the network. The
results showed that using a neural engine processor on a mobile smartphone device can maximize the
speed of the clustering algorithm, which shows an improvement in the performance of the cluttering
up to two-times faster compared with traditional laptop/desktop processors. Furthermore, the
number of iterations that are required to obtain (k) clusters was improved up to two-times faster than
parallel and distributed k-means.
Keywords: big data; clustering; neural engine; k-means; parallel computing
1. Introduction
Currently, we are in a data flood era, as proven by the massive amounts of continuously
generated data at unprecedented and ever-increasing scales. In the recent decade, machine
learning techniques have become increasingly popular in a wide range of large and complex
data-intensive applications, such as astronomy, as well as medicine, biology, and other
sciences [1]. These strategies offer potential options for extracting hidden information from
the data. However, as the era of big data approaches, the growth of dataset collection in
such a large and complex way makes it difficult to deal with it using conventional learning
methods, as the learning process for traditional datasets is not designed for and may not
work well with large amounts of data. Most classical machine learning algorithms are
built to process data that are loaded into memory [2], which is no longer true in the big
data context.
The usefulness of the massive volumes of data can be achieved only if the meaning
of those data is guaranteed, and proper information can lead to the right path. The
information-gathering process from huge unstructured or semi-structured data is the so-
called clustering technique. Clustering is a technique of grouping elements based on the
similarity of their characteristics and returns those elements as clusters. Thousands of
clustering algorithms have been published based on this concept, and k-means is one of the
most used. k-means is widely used with a wide range of applications due to its simplicity
of implementation and its effectiveness. In the literature, different modifications have been
proposed for improving the performance and efficiency of the k-means clustering algorithm.
Big data analytics can extract useful information from numerous amounts of data
generated by a variety of sources [3]. Although computer systems and Internet technologies
Electronics 2022, 11, 883. https://doi.org/10.3390/electronics11060883 https://www.mdpi.com/journal/electronics