An Application on Ensemble Learning Using KNIME Hilal ÇELIK Computer Engineering Big data Elazig, Turkey ORCID: https://orcid.org/0000-0001-5428-3411 Ahmet ÇINAR Computer Engineering Big data Elazig, Turkey ORCID: https://orcid.org/0000-0001-5528-2226 Abstract— Machine learning is the science of computers behaving and learning like humans with the knowledge and data of people's observations, without being directly programmed. In fact, machine learning is inspired by the learning processes of humans. Among machine learning methods, Bayes' theorem is an important subject studied in probability theory.Naive Bayes classifier is based on Bayes theorem. The way the algorithm works is that it calculates the probability of each state for an element and classifies it according to its highest probability value. Decision tree is a method based on classification by decomposing the data set according to common features. It consists of "branches", "leaves" and "roots", just like real-world trees. In decision trees, the superstructure is the root and the substructure is the leaves. It creates a structure that allows the branches to decide between the root and the leaf. Ensemble learning algorithms improve classification performance by combining many machine learning methods. In this study, decision trees from data mining techniques and naive bayes technique were applied on 215 data set "Academic and Employability Factors Affecting Placement". As a result of this study, the decision tree accuracy rate is 91,892, the naive bayes accuracy rate is 94,595 and the ensemble learning result is 97,297. Thus, a better result is obtained than the result of both algorithms used. The program is implemented on "Knime" program called as "end-to-end data science". Keywords— Ensemble Learning, Machine Learning, Naive Bayes, Decision Tree, Knime I. INTRODUCTION There are many implementations related to machine learning algorithm in the literature. Machine learning is a branch of artificial intelligence that aims at enabling machines to perform their jobs skillfully by using intelligent software. The statistical learning methods constitute the backbone of intelligent software that is used to develop machine intelligence. Because machine learning algorithms require data to learn, the discipline must have connection with the discipline of database. Similarly, there are familiar terms such as Knowledge Discovery from Data (KDD), data mining, and pattern recognition [1]. In this study, decision tree and naive bayes techniques of classification model, which is one of the data mining models, were used. The classification model is a data mining model that is used to predict the class of data of uncertain class, by utilizing existing data with a defined class. A decision tree is a structure used to divide a dataset that containing many records into smaller sets by applying a set of decision rules [2]. Bayes’ theorem is of fundamental importance for inferential statistics and many advanced machine learning models. Bayesian reasoning is a logical approach to updating the probability of hypotheses in the light of new evidence, and it therefore rightly plays a pivotal role in science [3] In this study, collective learning is carried out by combining the powers of not only decision tree classifier but also the Naive Bayes classifier. Here , both of machine learning architectures are run separately and MAVL (Majority Voting) is applied. There are studies with MAVL in the literature. The average accuracy of the Majority Vote approach has proven to be significantly better than other tested classifiers [4]. II. KNIME It is commonly known as “End-to-End Data Science”. KNIME is a software platform for building and producing data science using an easy and intuitive environment that allows each stakeholder in the data science process to focus on what they do best [5]. It is an open source data science software that facilitates the implementation of data mining tools such as data visualization, machine learning algorithm, association dry extraction. It consists of a work flow of nodes. Knime has many members around the world. These users share their work on Knime's official site, allowing other users to reach it. At the same time, you can get help from other members of the platform when you have problems in any of your studies. “Install KNIME Extensions” is one of the most important sections to know about Knime. When Knime is first installed, it does not host many nodes and libraries. Nodes and libraries suitable for the work done should be added to Knime after installation. In order to use the nodes we need while creating Knime's workflows, these nodes must be downloaded with "Install KNIME Extensions". III. ENSEMBLE LEARNING There are currently many methods to extract meaningful and useful information from data. Machine learning algorithms and machine learning methods come first among these methods. However, when the differenet algorithms run on same data sets, they give different results. “The whole is greater than its parts”. In this study, it is tried to benefit from many algorithms collectively at the same time. Thus, collective learning based on the strength of the community enables the most efficient selection of multiple different algorithms working on training and test data. Besides, time saving has become much more important in recent years, we increase the probability of test data being closer to the truth. An ensemble is itself a supervised learning algorithm, since this collective itself can be trained and then used to make predictions. The educated community, 2021 International Conference on Data Analytics for Business and Industry (ICDABI) 978-1-6654-1656-6/21/$31.00 ©2021 IEEE 400 2021 International Conference on Data Analytics for Business and Industry (ICDABI) | 978-1-6654-1656-6/21/$31.00 ©2021 IEEE | DOI: 10.1109/ICDABI53623.2021.9655815 Authorized licensed use limited to: ULAKBIM UASL - Firat Universitesi. Downloaded on April 11,2022 at 11:02:52 UTC from IEEE Xplore. Restrictions apply.