International Journal of Knowledge-based and Intelligent Engineering Systems 24 (2020) 205–215 205 DOI 10.3233/KES-200042 IOS Press Literature review and analysis on big data stream classiﬁcation techniques B. Srivani a,∗ , N. Sandhya b and B. Padmaja Rani c a JNTUH, Hyderabad, Telangana, India b CSE Department, VNRVJIET, Hyderabad, Telangana, India c CSE Department, JNTUCEH, Hyderabad, Telangana, India Abstract. Rapid growth in technology and information lead the human to witness the improved growth in velocity, volume of data, and variety. The data in the business organizations demonstrate the development of big data applications. Because of the improving demand of applications, analysis of sophisticated streaming big data tends to become a signiﬁcant area in data mining. One of the signiﬁcant aspects of the research is employing deep learning approaches for effective extraction of complex data representations. Accordingly, this survey provides the detailed review of big data classiﬁcation methodologies, like deep learning based techniques, Convolutional Neural Network (CNN) based techniques, K-Nearest Neighbor (KNN) based techniques, Neural Network (NN) based techniques, fuzzy based techniques, and Support vector based techniques, and so on. Moreover, a detailed study is made by concerning the parameters, like evaluation metrics, implementation tool, employed framework, datasets utilized, adopted classiﬁcation methods, and accuracy range obtained by various techniques. Eventually, the research gaps and issues of various big data classiﬁcation schemes are presented. Keywords: Big data streaming, classiﬁcation, CNN, accuracy, Map Reduce framework 1. Introduction Rapid improvement of World Wide Web (WWB), internet data rises and becomes big data. Based on the various properties of big data, big data classiﬁcation is utilized in network bandwidth, security Filtering net- work data management, category management, network reputation management, green internet, etc. Big Data is described using velocity, volume, and variety, [57]. If the variety, velocity, and volume of the data are en- hanced, the recent technologies are used to manage pro- cessing as well as storage of the data. The term Big Data Analytics is nothing, but the process of understanding and analysing the characteristics of vast datasets to ex- tract statistical and geometric patterns [56]. Most of the data generated is originally streaming data. This is because of the data representing actions, measurements, * Corresponding author: B. Srivani, Research Scholar, JNTUH, Hyderabad, Telangana, India. E-mail: srivaanib@gmail.com. and interactions, which come from the internet. Data is produced from an interval of time. In the streaming framework, high speed data, and algorithms must pro- cess very strict constraints of time and space. Stream- ing algorithms utilize data structures to give speed, and optimal answers [44]. One of the beneﬁcial tasks in the applications of mar- keting, biomedicine and social media is the big data classiﬁcation. The commonly employed model for re- solving the challenges of the big data is the single tra- ditional classiﬁcation model [58]. The classiﬁers used for big data classiﬁcation are Naive Bayes (NB), KNN, SVM, and so on. NB classiﬁers are widely used in infor- mation. Data fusion machine analytics is included for big data classiﬁcation [11], NB is used for classiﬁcation and target tracking in cloud computing [12] as well as robotics control [13]. Image and text are required for big data analysis, and for cyber analysis it consists of elastic learning and scalable methods [10,14]. By its nature, KNN classiﬁer is a slow classiﬁer and does not have a small ﬁxed-size training model for testing [9]. ISSN 1327-2314/20/$35.00 c  2020 – IOS Press and the authors. All rights reserved