www.astesj.com 330 Enhancing Decision Trees for Data Stream Mining Mostafa Yacoub 1,* , Amira Rezk 1 , Mohamed Senousy 2 1 Faculty of Computers and Information, Information System Department, Mansoura University, Mansoura, 35511, Egypt 2 Faculty of Management Sciences, Computer and Information system Departments, Sadat Academy for Management Sciences, Cairo, 00202, Egypt A R T I C L E I N F O A B S T R A C T Article history: Received: 29 July, 2021 Accepted: 15 October, 2021 Online: 23 October, 2021 Data stream gained obvious attention by research for years. Mining this type of data generates special challenges because of their unusual nature. Data streams flows are continuous, infinite and with unbounded size. Because of its accuracy, decision tree is one of the most common methods in classifying data streams. The aim of classification is to find a set of models that can be used to differentiate and label different classes of objects. The discovered models are used to predict the class membership of objects in a data set. Although many efforts were done to classify the stream data using decision trees, it still needs a special attention to enhance its performance, especially regarding time which is an important factor for data streams. This fast type of data requires the shortest possible processing time. This paper presents VFDT-S1.0 as an extension of VFDT (Very Fast Decision Trees). Bagging and sampling techniques are used for enhancing the algorithm time and maintaining accuracy. The experimental result proves that the proposed modification reduces time of the classification by more than 20% in more than one dataset. Effect on accuracy was less than 1% in some datasets. Time results proved the suitability of the algorithm for handling fast stream mining. Keywords: data stream mining classification decision trees VFDT 1. Introduction Recently, information played a major role in our world. Subsequently, the process of extracting knowledge is becoming very important. New applications that depend on data streams became more popular with time. Stream data are clear in sensors, telephone call records, click streams, social media, and stock market. Contrary to traditional data mining, which analyses a stored data set, the stream mining analyses a data stream which cannot be saved as it’s infinite and needs expensive storage capabilities. Data streams arrive continuously and with fast pace, this prevents multiple passes of the data. So, processing time is more constrained in data streams. Classification is a mining technique used to build a classification model based on the training data set which used to predict the class label of a new undefined data. Decision trees, neural networks, Bayesian networks, and Support Vector machines (SVM) are considered the most effective methods of classification. Decision trees are data structures organized hierarchically by splitting input space into local zones to predict the dependent variable. Decision trees are hierarchical data structures for supervised learning by which the input space is split into local regions to predict the dependent variable. [1] It is classified as greedy algorithms which try to find a decision at each step of small steps. Decision trees consist of nodes and edges (branches). Root node has no incoming edge. Leaves or terminal nodes have no outgoing edges. All other nodes – besides root – have exactly one ingoing edge. Internal or test nodes are the nodes with outgoing edges. Each internal node splits the instance space into two or more instance sub-space. These splits are done according to a specific splitting discrete function of attribute values (inputs). Classes are assigned to leaf nodes. Decision trees are characterized by simplicity, understandability, flexibility, adaptability and higher accuracy [2], [3]. The ability to handle both categorical and continuous data is an important advantage of decision trees. So, there is no need to normalize the data before running the decision tree model, that means fewer preprocessing processes. Being easier to construct and understand is another important factor for preferring decision ASTESJ ISSN: 2415-6698 *Corresponding Author: Mostafa Yacoub, Email: mostafayacoub3@gmail.com Advances in Science, Technology and Engineering Systems Journal Vol. 6, No. 5, 330-334 (2021) www.astesj.com https://dx.doi.org/10.25046/aj060537