Cluster Comput
DOI 10.1007/s10586-017-1212-x
A revised framework of machine learning application for optimal
activity recognition
Mohsin Bilal
1
· Faisal K. Shaikh
2
· Muhammad Arif
1
· Mudasser F. Wyne
3
Received: 22 March 2017 / Revised: 30 August 2017 / Accepted: 17 September 2017
© Springer Science+Business Media, LLC 2017
Abstract Data science augments manual data understand-
ing with machine learning for potential performance increase.
In this paper, data science methodology is examined to
enhance machine learning application in smartphone based
automatic human activity recognition (HAR). Eventually, a
modified feature engineering and a novel post-learning data
engineering are proposed in the machine learning framework
as the alternate of data understanding for an effective HAR.
The proposed framework is examined on two different HAR
data sets demonstrating a possibility of data-driven machine
learning for near an optimal classification of activities. The
proposed framework exhibited effectiveness and efficiency
when compared with the existing methods. The modified
feature engineering resulted in 42% fewer features required
by support vector machine to yield 97.3% correct recogni-
tion of human physical activities. However, the addition of
post-learning data engineering further improved the model
to perform 99% accurate classification, which is an almost
optimal performance.
Keywords Data-driven machine learning framework ·
Activity recognition · Post-learning data engineering ·
Composite feature set · Smartphone sensors
B Mohsin Bilal
mbhashmi@uqu.edu.sa
1
College of Computer and Information Systems,
Umm Al Qrua University, Makkah, Saudi Arabia
2
Department of Telecommunication Engineering, Mehran
University of Engineering and Technology, Jamshoro 76062,
Pakistan
3
School of Engineering and Computing, National University,
San Diego, USA
1 Introduction
Data and algorithms are central in computing and informat-
ics. Machine learning in last few decades devoted itself to
developing algorithms for learning the underlying patterns
in the data. It resulted in a set of valued learning algo-
rithms for many real-world applications. These applications
are changing the way of doing science, engineering, arts,
and entertainment. On the other hand, a widespread use of
computing technologies allowed capturing data at large scale
with continuity. A huge amount of data is being generated
in every second in many different forms. Learning from this
data is becoming a challenge for the researchers in comput-
ing and informatics. In due course, new fields are emerging
where the study of data is augmented with machine learn-
ing for better knowledge discovery. A recent trend is widely
known as data science with the focus on business analyt-
ics and decision support. Here, in this paper, we argue that
data science wraps more data understanding around machine
learning algorithm to get a higher degree of benefits for busi-
ness analytics and decision support. Alternatively, it may
become a natural heuristic to optimize the performance of
machine learning framework, in general. Therefore, in this
paper, we are investigating the performance of machine learn-
ing application in light of trending data science practices with
following motivations.
– The major goal of this research is to augment data under-
standing by dual data engineering in the machine learning
framework in such a way that it extends the performance
boundaries of machine learning applications. This is
consistent with the data science methodology where post-
learning data understanding is manually done, mainly
for model optimization to achieve the targeted learn-
ing goals. However, in addition to serving the current
123