International Journal of Engineering and Advanced Technology (IJEAT)
ISSN: 2249 – 8958, Volume-9 Issue-2, December, 2019
2424
Published By:
Blue Eyes Intelligence Engineering
& Sciences Publication
Retrieval Number: B3901129219/2019©BEIESP
DOI: 10.35940/ijeat.B3901.129219
Big Data Knowledge Discovery Platforms: A 360
Degree Perspective
Neelam Singh, Devesh Pratap Singh, Bhasker Pant
Abstract: Big Datais a buzzword affecting nearly every
domain and providing different set new opportunity for the
development of knowledge discovery process. Although it comes
with challengeslike abundance, extensiveness and diversity,
timeliness and dynamism, messiness and vagueness, and with an
uncertainty as all the data generated does not relates to any
specific question and can be associated with another process or
activity. To address these challenges are certainly cannot be
handled by the traditional infrastructure, platforms and
frameworks. New analytical techniques and high performance
computing architecture came into picture to handle this
explosion. These platforms and architecture are giving a cutting
edge to the Big Data Knowledge Discovery process by using
Artificial Intelligence, Machine Learning and Expert systems.
This study encompasses a comprehensive review of Big Data
analytical platforms and frameworks with their comparative
analysis. A Knowledge Discovery architecture for Big Data
Analytics is also proposed while considering the fundamental
aspect of gaining insights from Big Data sets and focus of this
analysis is to provide the open challenges associated with these
techniques and future research directions.
Keywords: Big Data, Knowledge Discovery, Artificial
Intelligence, Expert Systems.
I. INTRODUCTION
Dataexplosionhas initiated Big Data phenomenon. The term
“Big Data” originated into picture, in relation to present
context, in the late 1990s,“Francis X. Diebold” in his first
paper “Big Data Dynamic Factor Models for
Macroeconomic Measurement and Forecasting” in the year
2000 (published in 2003) marked the beginning of the much
sought after topic of today namely “Big Data” although the
acclaim of using the term is credited to John Mashey, the
chief scientist for SGI, in a Silicon Graphics (SGI) slide
deck through the heading of "Big Data and the NextWave of
InfraStress".
We are witnessing the Big Dataperiod, the issue here is not
getting data but accurate data and deploying computing
powers to boost our domain knowledge and also torecognize
patterns that cannot be classified or exploredformerly.“Big
Data” is identifiedas a phenomenon in which the traditional
functional abilities of enterprises has become less effective
and scalable to store, process, analyze and visualize the data.
Big Dataencompasses the gathering and dispensation of
outsized data sets and relatedarchitectures and
proceduresrequired to evaluatethem.
Revised Manuscript Received on December 15, 2019.
Neelam Singh, Assistant Professor, Department of Computer Science
and Engineering, Graphic Era Deemed to be University, Dehradun
(Uttarakhand) India.
Dr. Devesh Pratap Singh, Professor & Head of Computer Science and
Engineering Department, Graphic Era Deemed to be University, Dehradun
(Uttarakhand) India.
Dr. Bhasker Pant, Dean Research & Development and Associate
Professor, Department of Computer Science and Engineering, Graphic Era
Deemed to be University, Dehradun (Uttarakhand) India.
Big data architectures comes in variety of paradigm
spanning across multiple machines as cluster or distributed
in nature with specialized processes to handle knowledge
discovery process.
The integration of knowledge discovery process with Big
Datadriveopens a range of unique opportunities for
organizations in terms of future strategy, getting a
competitive edge and many more. Yet, Big Data comes
along with unidentified and distinctive architectural and
algorithmic challenges.
Knowledge Discovery from Data (KDD) can be defined as a
collection of processes integrated to excavatenovelfeatures
and knowledge from multifaceted datasets. KDD is an
interdisciplinary domain spanning its wings across
BioInformatics, Astronomy, Computer Science, Statistics,
IoT, Recommender Systems to name a few. Tools and
techniques for Knowledge Discovery are taken from
paradigms including distributed programming, machine
learning, statistical inferences, visualization and high
performance computing.
Colossal data sets i.e. Big Data comprises of hidden pattern
and knowledge which is likely to be discovered from,
knowledge discovery in databases (KDD) process, which
conventionally performs data selection, preprocessing,
subsampling, conversions, pattern discovery, post-
processing and knowledge exploitation in a chronological
order. Areas like business intelligence, medicine,
bioinformatics, military, education and research are highly
influenced and benefited by the application of data mining
techniques. Advancements in this area like in classification,
pattern matching has increased the potential to acquire
domain specific unexplored knowledge and value.
II. LITERATURE REVIEW
The “National Institute of Standards and Technology
(NIST)” [1] suggests that, “Big Data is where the data
volume, acquisition velocity, or data representation limits
the ability to perform effective analysis using traditional
relational approaches or requires the use of significant
horizontal scaling for efficient processing.”
Big Data is accumulated from heterogeneous data producing
sources. Like a smart wearable that produces the number of
steps a person has walked throughout the day, along with a
statistics of kcal burnt, heart rate, average speed and other
activities like cycling, swimming etc., terabytes of data
being produced bythe planned square kilometer array
telescope. Petabytes of data is being accumulated and
created by social networking sites like twitter, by scientific
experiments and by sensors every day [2]. Owing to its
given inherent characteristics Big Data pose the following
challenges: