International Journal of Knowledge-based and Intelligent Engineering Systems 24 (2020) 205–215 205 DOI 10.3233/KES-200042 IOS Press Literature review and analysis on big data stream classification techniques B. Srivani a, , N. Sandhya b and B. Padmaja Rani c a JNTUH, Hyderabad, Telangana, India b CSE Department, VNRVJIET, Hyderabad, Telangana, India c CSE Department, JNTUCEH, Hyderabad, Telangana, India Abstract. Rapid growth in technology and information lead the human to witness the improved growth in velocity, volume of data, and variety. The data in the business organizations demonstrate the development of big data applications. Because of the improving demand of applications, analysis of sophisticated streaming big data tends to become a significant area in data mining. One of the significant aspects of the research is employing deep learning approaches for effective extraction of complex data representations. Accordingly, this survey provides the detailed review of big data classification methodologies, like deep learning based techniques, Convolutional Neural Network (CNN) based techniques, K-Nearest Neighbor (KNN) based techniques, Neural Network (NN) based techniques, fuzzy based techniques, and Support vector based techniques, and so on. Moreover, a detailed study is made by concerning the parameters, like evaluation metrics, implementation tool, employed framework, datasets utilized, adopted classification methods, and accuracy range obtained by various techniques. Eventually, the research gaps and issues of various big data classification schemes are presented. Keywords: Big data streaming, classification, CNN, accuracy, Map Reduce framework 1. Introduction Rapid improvement of World Wide Web (WWB), internet data rises and becomes big data. Based on the various properties of big data, big data classification is utilized in network bandwidth, security Filtering net- work data management, category management, network reputation management, green internet, etc. Big Data is described using velocity, volume, and variety, [57]. If the variety, velocity, and volume of the data are en- hanced, the recent technologies are used to manage pro- cessing as well as storage of the data. The term Big Data Analytics is nothing, but the process of understanding and analysing the characteristics of vast datasets to ex- tract statistical and geometric patterns [56]. Most of the data generated is originally streaming data. This is because of the data representing actions, measurements, * Corresponding author: B. Srivani, Research Scholar, JNTUH, Hyderabad, Telangana, India. E-mail: srivaanib@gmail.com. and interactions, which come from the internet. Data is produced from an interval of time. In the streaming framework, high speed data, and algorithms must pro- cess very strict constraints of time and space. Stream- ing algorithms utilize data structures to give speed, and optimal answers [44]. One of the beneficial tasks in the applications of mar- keting, biomedicine and social media is the big data classification. The commonly employed model for re- solving the challenges of the big data is the single tra- ditional classification model [58]. The classifiers used for big data classification are Naive Bayes (NB), KNN, SVM, and so on. NB classifiers are widely used in infor- mation. Data fusion machine analytics is included for big data classification [11], NB is used for classification and target tracking in cloud computing [12] as well as robotics control [13]. Image and text are required for big data analysis, and for cyber analysis it consists of elastic learning and scalable methods [10,14]. By its nature, KNN classifier is a slow classifier and does not have a small fixed-size training model for testing [9]. ISSN 1327-2314/20/$35.00 c 2020 – IOS Press and the authors. All rights reserved