ISSN: 1220-1766 eISSN: 1841-429X 365
ICI Bucharest © Copyright 2012-2017. All rights reserved
1. Introduction
Nowadays, the Internet of Things (IoTs) is
growing quickly as a subset of big data. Billions of
recent physical devices, such as smart devices [4]
and Wireless sensing Sensor Networks (WSNs)
area unit [15] are expected to be connected in
the near future. WSNs are available in various
applications and services, mostly organizations,
including public and private, especially in the
medical field and health care Therefore, the
data gathered and collected from the WSNs
are considered to be a great source of big data.
With the recent advancements in communication
technology, more and more data are generated
and collected, therefore, the big data will grow
exponentially and this will increase the challenges
of extracting and retrieving the complexity of
the valuable hidden data. There are more than
three billion users of smart objects including
smart phones, smart homes, as well as business
and entertainment applications [16]. These smart
devices allow Machine to Machine (M2M)
electronic communication with or without an
intermediary-user. This has led to what is known
as the “Internet of Things (IoTs) “[8]. The huge
amount of data generation has been useful in
various felds such as commercial, industrial,
scientifc, social and medical [11], as shown in
Figure 1.
Big data is a collection of very huge datasets
with a great diversity of types so that it becomes
difficult to process by using state-of-the-art
data processing approaches or traditional data
processing platforms such as Processing Big
A Big Data Framework for
Mining Sensor Data Using Hadoop
Engy A. EL-SHAFEIY*, Ali I. EL-DESOUKY
Computers and Systems Department, Faculty of Engineering,
Mansoura University, Egypt
(*Corresponding author) e-mail: engy.elshafeiy@gmail.com.
Abstract: The data gathered from IOTs is considered of high business value. The IOTs devices sense the natural conditions
using sensor network comprised of sensor nodes. Mining of big sensor data for useful knowledge extraction is a very
challenging task. Frequent itemsets is one of the most effective mining techniques that fnd important itemsets from big
sensor data. In this paper, a MapReduce Frequent Nodesets-based Boundary POC tree (MR-FNBP) framework is proposed
for mining Frequent Nodesets for big sensor data. The MapReduce framework is used to implement MR-FNBP to enhance
its performance in highly distributed environments. Additionally, the proposed Boundary (FNBP) creates a Boundary as an
early stage to exclude the infrequent itemsets, and this may reduce the overall memory and time usage. Moreover, a number
of experiments were performed to evaluate the performance of MR-FNBP framework. The results show high scalability rate
and a less time consuming process for MR-FNBP framework over different recent systems.
Keywords: Big data, Internet of Things, MapReduce, Wireless Sensor Networks, Mining Frequent Nodesets.
Trajectory Data [19]. In 2012, Gartner retrieved
and gave a more detailed defnition as: Big data
are high-volume, high-velocity, and/or high-
variety information assets that require new forms
of processing to enable enhanced decision making,
insight discovery and process optimization. The
main characteristic of Big data included the 3Vs
characteristics (Veracity, Viability, and Value)
and then was elaborated to include the following
characteristics known as the 6Vs:
Volume: Describes the huge data size.
Velocity: Describes the data communication,
processing speeds per time unit.
Variety: Describes the different data types
(structured, semi-structured, and unstructured).
Value: Describes the valuable data knowledge
Veracity: Describes the data quality, such as
data cleaning, fltering.
Viability : Describes the prediction
possibilities.
More generally, a dataset can be called big
data if it is formidable to perform capture,
analysis and visualization on it using current
technology. With diversifed data provisions,
such as sensor networks, telescopes, scientifc
experiments, and high throughput instruments,
the datasets increase at exponential rate [18].
Other Big data applications lie in many scientifc
disciplines such as astronomy, atmospheric
science, medicine, genomics, biologic,
biogeochemistry and other complex and
interdisciplinary scientifc researches. Web-based
Studies in Informatics and Control 26 (3) 365-376, September 2017
https://doi.org/10.24846/v26i3y201712