  Citation: Awad, F.H.; Hamad, M.M. Improved k-Means Clustering Algorithm for Big Data Based on Distributed SmartphoneNeural Engine Processor. Electronics 2022, 11, 883. https://doi.org/10.3390/ electronics11060883 Academic Editor: Miin-shen Yang Received: 25 February 2022 Accepted: 9 March 2022 Published: 11 March 2022 Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional afﬁl- iations. Copyright: © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/). electronics Article Improved k-Means Clustering Algorithm for Big Data Based on Distributed SmartphoneNeural Engine Processor Fouad H. Awad * and Murtadha M. Hamad College of Computer Science and Information Technology, University of Anbar, Ramadi 31001, Iraq; dr.mortadha61@uoanbar.edu.iq * Correspondence: fouad.hammadi@uoanbar.edu.iq Abstract: Clustering is one of the most signiﬁcant applications in the big data ﬁeld. However, using the clustering technique with big data requires an ample amount of processing power and resources due to the complexity and resulting increment in the clustering time. Therefore, many techniques have been implemented to improve the performance of the clustering algorithms, especially for k- means clustering. In this paper, the neural-processor-based k-means clustering technique is proposed to cluster big data by accumulating the advantage of dedicated machine learning processors of mobile devices. The solution was designed to be run with a single-instruction machine processor that exists in the mobile device’s processor. Running the k-means clustering in a distributed scheme run based on mobile machine learning efﬁciently can handle the big data clustering over the network. The results showed that using a neural engine processor on a mobile smartphone device can maximize the speed of the clustering algorithm, which shows an improvement in the performance of the cluttering up to two-times faster compared with traditional laptop/desktop processors. Furthermore, the number of iterations that are required to obtain (k) clusters was improved up to two-times faster than parallel and distributed k-means. Keywords: big data; clustering; neural engine; k-means; parallel computing 1. Introduction Currently, we are in a data ﬂood era, as proven by the massive amounts of continuously generated data at unprecedented and ever-increasing scales. In the recent decade, machine learning techniques have become increasingly popular in a wide range of large and complex data-intensive applications, such as astronomy, as well as medicine, biology, and other sciences [1]. These strategies offer potential options for extracting hidden information from the data. However, as the era of big data approaches, the growth of dataset collection in such a large and complex way makes it difﬁcult to deal with it using conventional learning methods, as the learning process for traditional datasets is not designed for and may not work well with large amounts of data. Most classical machine learning algorithms are built to process data that are loaded into memory [2], which is no longer true in the big data context. The usefulness of the massive volumes of data can be achieved only if the meaning of those data is guaranteed, and proper information can lead to the right path. The information-gathering process from huge unstructured or semi-structured data is the so- called clustering technique. Clustering is a technique of grouping elements based on the similarity of their characteristics and returns those elements as clusters. Thousands of clustering algorithms have been published based on this concept, and k-means is one of the most used. k-means is widely used with a wide range of applications due to its simplicity of implementation and its effectiveness. In the literature, different modiﬁcations have been proposed for improving the performance and efﬁciency of the k-means clustering algorithm. Big data analytics can extract useful information from numerous amounts of data generated by a variety of sources [3]. Although computer systems and Internet technologies Electronics 2022, 11, 883. https://doi.org/10.3390/electronics11060883 https://www.mdpi.com/journal/electronics