An Application on Ensemble Learning Using
KNIME
Hilal ÇELIK
Computer Engineering
Big data
Elazig, Turkey
ORCID: https://orcid.org/0000-0001-5428-3411
Ahmet ÇINAR
Computer Engineering
Big data
Elazig, Turkey
ORCID: https://orcid.org/0000-0001-5528-2226
Abstract— Machine learning is the science of computers
behaving and learning like humans with the knowledge and
data of people's observations, without being directly
programmed. In fact, machine learning is inspired by the
learning processes of humans. Among machine learning
methods, Bayes' theorem is an important subject studied in
probability theory.Naive Bayes classifier is based on Bayes
theorem. The way the algorithm works is that it calculates the
probability of each state for an element and classifies it
according to its highest probability value. Decision tree is a
method based on classification by decomposing the data set
according to common features. It consists of "branches",
"leaves" and "roots", just like real-world trees. In decision
trees, the superstructure is the root and the substructure is the
leaves. It creates a structure that allows the branches to decide
between the root and the leaf. Ensemble learning algorithms
improve classification performance by combining many
machine learning methods. In this study, decision trees from
data mining techniques and naive bayes technique were
applied on 215 data set "Academic and Employability Factors
Affecting Placement". As a result of this study, the decision
tree accuracy rate is 91,892, the naive bayes accuracy rate is
94,595 and the ensemble learning result is 97,297. Thus, a
better result is obtained than the result of both algorithms
used. The program is implemented on "Knime" program
called as "end-to-end data science".
Keywords— Ensemble Learning, Machine Learning, Naive
Bayes, Decision Tree, Knime
I. INTRODUCTION
There are many implementations related to
machine learning algorithm in the literature. Machine
learning is a branch of artificial intelligence that aims at
enabling machines to perform their jobs skillfully by using
intelligent software. The statistical learning methods
constitute the backbone of intelligent software that is used to
develop machine intelligence. Because machine learning
algorithms require data to learn, the discipline must have
connection with the discipline of database. Similarly, there
are familiar terms such as Knowledge Discovery from Data
(KDD), data mining, and pattern recognition [1]. In this
study, decision tree and naive bayes techniques of
classification model, which is one of the data mining
models, were used. The classification model is a data
mining model that is used to predict the class of data of
uncertain class, by utilizing existing data with a defined
class. A decision tree is a structure used to divide a dataset
that containing many records into smaller sets by applying a
set of decision rules [2]. Bayes’ theorem is of fundamental
importance for inferential statistics and many advanced
machine learning models. Bayesian reasoning is a logical
approach to updating the probability of hypotheses in the
light of new evidence, and it therefore rightly plays a pivotal
role in science [3]
In this study, collective learning is carried out by combining
the powers of not only decision tree classifier but also the
Naive Bayes classifier. Here , both of machine learning
architectures are run separately and MAVL (Majority
Voting) is applied. There are studies with MAVL in the
literature. The average accuracy of the Majority Vote
approach has proven to be significantly better than other
tested classifiers [4].
II. KNIME
It is commonly known as “End-to-End Data
Science”. KNIME is a software platform for building and
producing data science using an easy and intuitive
environment that allows each stakeholder in the data science
process to focus on what they do best [5]. It is an open
source data science software that facilitates the
implementation of data mining tools such as data
visualization, machine learning algorithm, association dry
extraction. It consists of a work flow of nodes.
Knime has many members around the world. These
users share their work on Knime's official site, allowing
other users to reach it. At the same time, you can get help
from other members of the platform when you have
problems in any of your studies. “Install KNIME
Extensions” is one of the most important sections to know
about Knime. When Knime is first installed, it does not host
many nodes and libraries. Nodes and libraries suitable for
the work done should be added to Knime after installation.
In order to use the nodes we need while creating Knime's
workflows, these nodes must be downloaded with "Install
KNIME Extensions".
III. ENSEMBLE LEARNING
There are currently many methods to extract
meaningful and useful information from data. Machine
learning algorithms and machine learning methods come
first among these methods. However, when the differenet
algorithms run on same data sets, they give different results.
“The whole is greater than its parts”. In this study, it is tried
to benefit from many algorithms collectively at the same
time. Thus, collective learning based on the strength of the
community enables the most efficient selection of multiple
different algorithms working on training and test data.
Besides, time saving has become much more important in
recent years, we increase the probability of test data being
closer to the truth.
An ensemble is itself a supervised learning
algorithm, since this collective itself can be trained and then
used to make predictions. The educated community,
2021 International Conference on Data Analytics for Business and Industry (ICDABI)
978-1-6654-1656-6/21/$31.00 ©2021 IEEE 400
2021 International Conference on Data Analytics for Business and Industry (ICDABI) | 978-1-6654-1656-6/21/$31.00 ©2021 IEEE | DOI: 10.1109/ICDABI53623.2021.9655815
Authorized licensed use limited to: ULAKBIM UASL - Firat Universitesi. Downloaded on April 11,2022 at 11:02:52 UTC from IEEE Xplore. Restrictions apply.