206 | International Journal of Current Engineering and Technology, Vol.4, No.1 (Feb 2014)
Review Article
International Journal of Current Engineering and Technology
E-ISSN 2277 – 4106, P-ISSN 2347 - 5161
©2014 INPRESSCO
®
, All Rights Reserved
Available at http://inpressco.com/category/ijcet
Review: Approaches for Handling DataStream
Purva S. Gogte
Ȧ*
and Deepti P. Theng
Ȧ
Ȧ
Department of Computer Science and Engineering, G. H. Raisoni College of Engineering, Nagpur, Maharashtra, India-441110
Accepted 10 January 2014, Available online 01 February 2014, Vol.4, No.1 (February 2014)
Abstract
Today, there is tremendous use of technology that causes generation of huge volume of data called as Data Stream. Data
Stream are continuous, unbounded and usually come with high speed and changes with time. It has different issues such
as Memory, Time, Noise, Dynamic data. There is need of handling data streams because of its changing nature, and the
data stream may be labelled or it may be unlabelled. Classification is supervised it can only handle labelled data. Thus,
there is need of Hybrid Ensemble Classifier in which clustering and classifier are brought together so that the labelled as
well as unlabelled datastream both can be handled. This Paper describes different Approaches for Handling
DataStream.
Keywords: Data Streams, Clustering, Classification
1. Introduction
1
In recent years, many sources of streaming data have been
developed. Tens of applications and millions of users
access the World Wide Web daily. Moreover, advances in
hardware devices, like wireless sensors and mobile
devices, led to an increase in the applications that generate
streaming data.(Satpute Pravin C,2012).Data Stream is a
sequence of continuously arriving data items at a high
speed which are real time, implicitly or explicitly ordered
by timestamps, evolving and uncertain in nature. Data
Stream mining has recently emerged as a growing field of
multidisciplinary research. It combines various research
areas such as databases, machine learning, artificial
intelligence, statistics, automated scientific discovery data
visualization, decision science, and high performance
computing thus, Data stream classification has been a
widely studied research problem in recent years. The
dynamic and evolving nature of data stream requires
efficient and effective techniques that are significantly
different from static data classification techniques. In
recent years mining data streams in large real time
environments has become a challenging job due to wide
range of applications that generate boundless stream of
data such as log records, mobile application sensors,
emails, blogging, credit card, fraud detection, medical
imaging, intrusion detection, weather monitoring, stock
trading, planetary remote sensing etc.
There are many issues while handling with the data
streams which are summarized as follows:
i) Large space: Data streams have enormous volumes of
continuously incoming data.
*Corresponding author: Purva S. Gogte
ii) Dynamic data: Data streams are fast, changing,
uncertain and require fast response to incorporate changes
in data and reflect it in output.
iii) Noise: Any approach applied to data streams should be
able to deal with noise and outliers.
iv) Single scan: Since data streams have infinite volume of
information which is fast and changing, hence stream data
should be read only once.
v) Light weight: Techniques applied to vast data streams
should process stream less time and memory to should
provide an optimal output
Data Stream are nothing but the Big data .The term
“Big data” is used for large data sets whose size is beyond
the ability of commonly used software tools to capture,
manage, and process. Big data sizes are a constantly
moving target currently ranging from a few dozen
terabytes to many petabytes of data in a single data set.
Typical examples of big data found in current scenario
includes web logs, RFID generated data, sensor networks,
satellite and geo-spatial data, social data from social
networks, Internet text and documents, Internet search
indexing, call detail records, astronomy, atmospheric
science, genomics, biogeochemical etc. Big Data has
emerged because we are living in a society which makes
increasing use of data intensive technologies.
There are many Big data problems such as it is
difficult to use relational databases with big data. The
various challenges faced in large data management include
scalability, unstructured data, accessibility, real time
analytics, fault tolerance and many more. In addition to
variations in the amount of data stored in different sectors,
the types of data generated and stored i.e., whether the
data encodes video, images, audio, or text/numeric
information also differ markedly from industry to industry