International Journal of Electrical and Computer Engineering (IJECE) Vol. 14, No. 2, April 2024, pp. 1959~1968 ISSN: 2088-8708, DOI: 10.11591/ijece.v14i2.pp1959-1968 1959 Journal homepage: http://ijece.iaescore.com Comparison of Iris dataset classification with Gaussian naïve Bayes and decision tree algorithms Yasi Dani, Maria Artanta Ginting Computer Science Department, School of Computer Science, Bina Nusantara University, Bandung Campus, Bandung, Indonesia Article Info ABSTRACT Article history: Received Aug 3, 2023 Revised Oct 16, 2023 Accepted Dec 13, 2023 In this study, we apply two classification algorithm methods, namely the Gaussian naïve Bayes (GNB) and the decision tree (DT) classifiers. The Gaussian naïve Bayes classifier is a probability-based classification model that predicts future probabilities based on past experiences. Whereas the decision tree classifier is based on a decision tree, a series of tests that are performed adaptively where the previous test affects the next test. Both of these methods are simulated on the Iris dataset where the dataset consists of three types of Iris: setosa, virginica, and versicolor. The data is divided into two parts, namely training and testing data, in which there are several features as information on flower characteristics. Furthermore, to evaluate the performance of the algorithms on both methods and determine the best algorithm for the dataset, we evaluate it using several metrics on the training and testing data for each method. Some of these metrics are recall, precision, F1-score, and accuracy where the higher the value, the better the algorithm's performance. The results show that the performance of the decision tree classifier algorithm is the most outperformed on the Iris dataset. Keywords: Classification Decision tree Gaussian naïve Bayes Iris dataset Machine learning This is an open access article under the CC BY-SA license. Corresponding Author: Yasi Dani Computer Science Department, School of Computer Science, Bina Nusantara University Pasir Kaliki street No.25-27, Ciroyom, Bandung 40181, Indonesia Email: yasi.dani@binus.ac.id 1. INTRODUCTION Classification is a machine learning approach in data mining that is often used where many methods are chosen to classify a dataset [1], [2]. Classifications that involve two classes are called binary classifications [3], [4], while those that involve more than two classes are called multi-class classifications [5]–[7]. In real applications, classification techniques are needed such as medical disease analysis, text classification, user smartphone classification, and images [8]–[10]. In recent years, many researchers have studied machine learning classification methods using the Iris dataset. Wu et al. [11] compared the classification of the Iris dataset using the boosting tree, random forest, and GraftedTrees algorithms, where the performance of the algorithms was still below 0.85. Thirunavukkarasu et al. [12] classified the Iris dataset using the KNN algorithm and the performance of the algorithm uses one metric which is the accuracy value where the accuracy of the training data is 0.975 and the test data is 0.967. Swain et al. [13] studied neural networks to classify the Iris dataset and evaluated the performance of the algorithm using one metric which is the accuracy value where the value is in the interval [0.833, 0.967]. Ghazal et al. [14] compared three classification algorithms namely decision tree, neural networks, and naïve Bayes using WEKA software where these algorithms were evaluated using one metric which was the ROC curve whose value was in the interval [0.955, 0.941].