Naive Bayes and Decision Tree Classifier for Streaming Data Using HBase Aradhita Mukherjee, Sudip Mondal, Nabendu Chaki and Sunirmal Khatua Abstract Classification in real-time environment on streaming data set is one of the most challenging research areas nowadays. Data streaming is used in real-time environment where massive volume of data is generated in small sizes chunks which need to be processed very fast. HBase is a good option which is used for storing such heterogeneous massive small data files in a way so that scalability and availability are preserved. In real-time environment, data are generated exponentially. Thus to store auto incremented data, dynamic splitting is needed which is supported by HBase. We choose tobacco-affected student record and observed that Naive Bayes classifier is less complex and more accurate than decision tree. Also, in real-time environment, it shows its efficacy compared to others when the training sample is too large which is handled by HBase. The key value store in HBase provides the classifiers an extra edge by improving its performance in terms of time. Keywords Big data · HBase · Naive Bayes classifier · Real-time classification Data streaming · Scalability A. Mukherjee (B ) · S. Mondal · N. Chaki · S. Khatua Department of Computer Science & Engineering, University of Calcutta, Kolkata, West Bengal, India e-mail: aradhita.mukherjee.2016@gmail.com S. Mondal e-mail: sudip.wbsu@gmail.com N. Chaki e-mail: nabendu@ieee.org S. Khatua e-mail: enggnimu_ju@yahoo.com © Springer Nature Singapore Pte Ltd. 2019 R. Chaki et al. (eds.), Advanced Computing and Systems for Security, Advances in Intelligent Systems and Computing 897, https://doi.org/10.1007/978-981-13-3250-0_8 105