A Smart Data Pre-Processing Approach by Using ML Algorithms on IoT Edges: A
Case Study
Şükrü Mustafa Kaya
Computer Engineering Department
Istanbul Aydin University
Istanbul, Turkey
smustafakaya@stu.aydin.edu.tr
Ali Güneş
Computer Engineering Department
Istanbul Aydin University
Istanbul, Turkey
aligunes@aydin.edu.tr
Atakan Erdem
Department of Biological Sciences
University of Calgary
Calgary,Canada
atakan.erdem1@ucalgary.ca
Abstract—The internet of things (IoT) is a technology that
allows many objects used in daily life to produce a variety of
data and transfer those data to other objects or systems. The
application domain of this system is increasing day by day, and
the technologies used for its infrastructure are also varied.
However, to process the huge amount of sensor data
effectively, smart and fast filtering solutions are required. As a
data pre- processing task, smart data filtering improves not
only the data processing speed but also the quality of data as
well. In other words, big data management is facilitated by
getting more effective results with little noise and meaningful
data. In this study, we examined big IoT data stored on IoT
edges to detect anomalies in temperature, age, gender, weight,
height, and time data. In this context, the Logistic Regression
algorithm was applied at both sensing and network layers for
anomaly detection purposes. Furthermore, the performance of
the classification algorithm in terms of speed and accuracy was
reported as the output of the study.
Keywords-component; internet of things; big data
management; big data analytics; data filtering
I. INTRODUCTION
As a result of digitalization gaining momentum in the
world, the generation, collection, analysis, and storage of
data that will facilitate our daily lives and the establishment
of decision-making mechanisms based on meaningful data
have gained importance. Parallel to the proceedings, IoT
technology including cloud computing and database systems,
which can detect the sensing networks, devices, or people
that can observe the physical world, produce and process
data, and perform decision-making processes, has emerged.
The devices that make up this technology can communicate
with each other over the internet and share information. As a
result of this feature, the IoT technology is being used
effectively in smart agriculture, smart homes, smart industry
smart cities, and smart energy systems. However, it is
impossible for IoT devices to filter data while producing data
[1, 2]. IoT edges are the first place where data can be pre-
processed before the generated data go to the cloud. It is
important to filter data before they go to the cloud because if
filtering is not done, the success of cloud services in terms of
speed and accuracy decreases [3, 4]. Therefore, speed and
accuracy are two important criteria to consider. Since there
are no similar studies prioritizing the speed and accuracy
criteria within this scope, it is thought that our study and the
obtained experimental results will have important
contributions to the studies in this field. Studies in different
IoT areas can be mentioned as examples to show the
importance of the problems we focus on.
Eugene S. et al. [5] examine the benefits of a wide range
of efficient, successful, and innovative applications and
services for the IoT and big data analysis. The study aims to
examine data analysis applications in different IoT areas, to
provide a classification of analytical approaches, and to put
forward a layered taxonomy from internet of things data to
analytics. The taxonomy supply insight into the
appropriateness of analytical techniques; and with the
obtained information, a meaningful result is obtained that
provides the technology and infrastructure for IoT analytics.
As a result, developments that will shape future research on
the IoT are being investigated. In their article, Gunasekaran
M. et al. suggest a new architecture for the application of the
internet of things to storage and process scalable big sensor
data for healthcare implementations. The suggested
architecture consists of two key sub architectures: The meta
fog routing (MF-R) and grouping and selection (GC)
architectures. The MF-R architecture uses big data
technologies such as apache pig and apache Hbase to collect
and store the big sensor data produced from distinct sensor
devices. The suggested GC architecture is used to enable the
integration of fog computing with cloud computing. In
addition, a MapReduce based on a prediction model is used
to presage heart diseases using the architecture [5, 6].
Yasmin F. et al. propose an adaptive method to reduce
data. The proposed method is an estimation-based data
reduction utilizing LMS adaptive filters. Specifically, the
recommended method for both the source and base station
nodes is based on a convex integration of two LMS window
filters separated using different sizes to predict the next
measured values since the sensor nodes must immediately
transmit the detected values only when there is a significant
deviation from the predicted values [7]. This article proposes
a new model for the effective management of big data
generated by different sources, such as sensor data that do
not require human intervention, by optimizing virtual
machine selection. The planned model aims to optimize the
store of patients’ data to provide a real time data recall
mechanism and thus to improve the performance of health
systems [8]. In another study, studies on the internet of
36
2021 International Conference on Artificial Intelligence of Things (ICAIoT)
978-1-6654-0176-0/21/$31.00 ©2021 IEEE
DOI 10.1109/ICAIoT53762.2021.00014
2021 International Conference on Artificial Intelligence of Things (ICAIoT) | 978-1-6654-0176-0/21/$31.00 ©2021 IEEE | DOI: 10.1109/ICAIoT53762.2021.00014
Authorized licensed use limited to: ULAKBIM UASL - Istanbul Aydin Universitesi. Downloaded on June 03,2022 at 07:32:18 UTC from IEEE Xplore. Restrictions apply.