Naive Bayes and Decision Tree Classifier
for Streaming Data Using HBase
Aradhita Mukherjee, Sudip Mondal, Nabendu Chaki and Sunirmal Khatua
Abstract Classification in real-time environment on streaming data set is one of
the most challenging research areas nowadays. Data streaming is used in real-time
environment where massive volume of data is generated in small sizes chunks which
need to be processed very fast. HBase is a good option which is used for storing such
heterogeneous massive small data files in a way so that scalability and availability are
preserved. In real-time environment, data are generated exponentially. Thus to store
auto incremented data, dynamic splitting is needed which is supported by HBase.
We choose tobacco-affected student record and observed that Naive Bayes classifier
is less complex and more accurate than decision tree. Also, in real-time environment,
it shows its efficacy compared to others when the training sample is too large which
is handled by HBase. The key value store in HBase provides the classifiers an extra
edge by improving its performance in terms of time.
Keywords Big data · HBase · Naive Bayes classifier · Real-time classification
Data streaming · Scalability
A. Mukherjee (B ) · S. Mondal · N. Chaki · S. Khatua
Department of Computer Science & Engineering, University of Calcutta,
Kolkata, West Bengal, India
e-mail: aradhita.mukherjee.2016@gmail.com
S. Mondal
e-mail: sudip.wbsu@gmail.com
N. Chaki
e-mail: nabendu@ieee.org
S. Khatua
e-mail: enggnimu_ju@yahoo.com
© Springer Nature Singapore Pte Ltd. 2019
R. Chaki et al. (eds.), Advanced Computing and Systems for Security,
Advances in Intelligent Systems and Computing 897,
https://doi.org/10.1007/978-981-13-3250-0_8
105