Smart Controller Integrated with MQTT Broker Based on Machina Learning Techniques Wial Hanon 1* , Mahdi Abed Salman 2 1 Information Technology, Software Department, University of Babylon, Hilla 51001, Iraq 2 College of Science for Women, Department of Computer Science, University of Babylon, Hilla 51001, Iraq Corresponding Author Email: wailh@uobabylon.edu.iq Copyright: ©2024 The authors. This article is published by IIETA and is licensed under the CC BY 4.0 license (http://creativecommons.org/licenses/by/4.0/). https://doi.org/10.18280/jesa.570109 ABSTRACT Received: 28 October 2023 Revised: 16 January 2024 Accepted: 26 January 2024 Available online: 29 February 2024 Massive amounts of heterogeneous data are produced by Internet of Things (IoT) devices utilized in daily life and numerous fields, and these data streams need to be stored, processed, analyzed, and transmitted to the cloud. It usually suffers from missing values and anomalies; system services also suffer from congestion due to slow processors, resulting in low throughput, a high response time, slow decision-making, and data loss, resulting in low quality of service and the deterioration of the system's performance. In this study, propose to integrate the smart controller (SC) with the Message Queuing Telemetry Transport (MQTT) broker and services in the fog node to make decisions automatically to prevent congestion in the system's services and speed up the processing. The IoT stream is inspected in the services for anomalies using one-class support vector machines (OCSVM). Then, using the integrating technique of principal component analysis (PCA) and the k-nearest neighbors (KNN) algorithm in the SC, obtain the best prediction of the efficient number of services that must be deployed in the system. The operating model proposed showed significantly stable system performance in terms of throughput, latency, response time, the amount of data loss, and preventing congestion. Keywords: Message Queuing Telemetry Transport (MQTT), smart controller, spawn, latency, throughput, screen table, data loss, PCA, k- nearest neighbors regression (KNNR) 1. INTRODUCTION This era witnesses growth in number of IoT devices with multiple uses in daily life, such in smart homes, health care, and wearable devices, production quality, and other fields of life [1]. Data from a smart city or health care are two examples of the many sources and formats of the vast volumes of data [2]. Data sizes have become widely distributed and need effective techniques for resource management in storage, processing, and analysis [3], such as cloud computing [4, 5]. However, collecting and sending raw data to the remote cloud suffers from high latency because of network congestion, and low processing throughput. The researchers suggested using the publication of topics and geographical location at the edge of the cloud (cloud computing gateway) to increase the deployment of IoT devices with quality of service and throughput. The use of the edge with IoT applications suffers from challenges represented by heterogeneous data sources, a lack of resources for large processors, and low bandwidth [6]. Among the obstacles of production and processing in the cloud and overcoming all edge/cloud computing challenges is the fog computing technology which has emerged as a compromise solution to alleviate these problems [7, 8]. Moreover, investment in the fog computing environment provides the resources required for the applications of IoT and reduces latency [3], and improves service quality [9]. The broker works in a dynamic publishing and subscription model inside a fog node that may support useful and flexible features such as anonymity, multiple publishers and subscribers, synchronization, and finally, no system failure if one of the subscribers is not connected to the Internet [10-12]. It provides a fast response time, enhances the performance of fog computing, and reduces lost messages [13]. Problems that occur in services due to congestion, data loss, slow processing, and decreased system performance, are a motivation for using the proposed model that integrates the broker and SC with a group of services. In addition, a dynamic solution must be found that can evaluate the performance of the system’s services at any time without the need for human intervention. This paper proposes integrating a smart controller module that makes dynamic decisions for add (spawn) or remove (kill) the services automatically with an MQTT broker in a fog node. In the same context, the SC is a service that assists improving the performance of the system by monitoring and collecting information on all measurement services. The latency, throughput, and data loss due to overload and the high processing time of data are measures used in the SC to evaluation measures by applying the PCA and KNN algorithms. For reliability, machine learning algorithms (One- Class-SVM) [14] are used in these services for preprocessing data streams to detect anomalies. The integrating algorithms of the PCA and KNN regression (KNNR) allow effective features selection, handling of multicollinearity, improved generalization, and computational efficiency. By leveraging the strengths of both techniques, the performance and efficiency of the regression model can be enhanced. Unsupervised machine learning algorithms like PCA try to minimize the dimensionality (number of features) Journal Européen des Systèmes Automatisés Vol. 57, No. 1, February, 2024, pp. 87-94 Journal homepage: http://iieta.org/journals/jesa 87