IAES International Journal of Artificial Intelligence (IJ-AI) Vol. 11, No. 1, March 2022, pp. 276~283 ISSN: 2252-8938, DOI: 10.11591/ijai.v11.i1.pp276-283 276 Journal homepage: http://ijai.iaescore.com Model optimisation of class imbalanced learning using ensemble classifier on over-sampling data Yulia Ery Kurniawati, Yulius Denny Prabowo Department of Informatics, Faculty of Computers Science and Design, Institut Teknologi dan Bisnis Kalbis, Jakarta, Indonesia Article Info ABSTRACT Article history: Received May 28, 2021 Revised Dec 23, 2021 Accepted Jan 4, 2022 Data imbalance is one of the problems in the application of machine learning and data mining. Often this data imbalance occurs in the most essential and needed case entities. Two approaches to overcome this problem are the data level approach and the algorithm approach. This study aims to get the best model using the pap smear dataset that combined data levels with an algorithmic approach to solve data imbalanced. The laboratory data mostly have few data and imbalance. Almost in every case, the minor entities are the most important and needed. Over-sampling as a data level approach used in this study is the synthetic minority oversampling technique-nominal (SMOTE-N) and adaptive synthetic-nominal (ADASYN-N) algorithms. The algorithm approach used in this study is the ensemble classifier using AdaBoost and bagging with the classification and regression tree (CART) as learner-based. The best model obtained from the experimental results in accuracy, precision, recall, and f-measure using ADASYN-N and AdaBoost- CART. Keywords: Adaptive synthetic-nominal class imbalance learning Ensemble classifier Over-sampling Synthetic minority oversampling technique- nominal This is an open access article under the CC BY-SA license. Corresponding Author: Yulia Ery Kurniawati Department of Informatics, Faculty of Computer Science and Design, Institut Teknologi dan Bisnis Kalbis Jalan Pulomas Selatan Kav 22, Kayu Putih, Pulogadung, Jakarta Timur 13210, Indonesia Email: yulia.kurniawati@kalbis.ac.id 1. INTRODUCTION One of the problems of machine learning and data mining is imbalanced data. Imbalanced occurs when there is disproportion among the number of examples of each class in the dataset [1] and usually in the most essential and needed entities. It will be a complicated issue when dealing with the multiclass problem. It will be hard to acknowledge a priori of the multi-majority and multi-minority classes that should be stressed during the learning stage. For example, machine learning in data mining has difficulty classifying minority classes or classes with the smallest number of instances because the algorithm assumes that the class distribution is balanced. So that in some cases, there are errors in classifying the results for each class. The result is errors in the classification of minority classes due to the class imbalance that tends to focus on the majority class and ignore the minority class at the time of classification. The imbalanced data can be found in many areas such as medical [2], [3], abnormal electricity consumption [4], price forecasting [5], credit evaluation [6], and cyanobacteria bloom [7]. There are two approaches to solving this problem in dealing with class imbalance: the data level approach, the algorithmic approach, and hybrid-based approaches [8], [9]. The data-level approach can use the sampling method. This data sampling method is divided into two: the sampling method in the minority class (over-sampling) [10], [11], and the majority class sampling method (under-sampling) [12], [13]. Meanwhile, the algorithm approach is an approach by designing new algorithms or refining existing algorithms, and it uses the ensemble method. Ensemble methods use one set of classifiers to make a