Classification of Harvesting Age of Mango Based on NIR Spectra Using Machine Learning Algorithms Nunik Destria Arianti 1* , Muhamad Muslih 1 , Carti Irawan 1 , Edo Saputra 2,3 , Sariyusda 4 , Ramayanty Bulan 5 1 Department of Information System, Nusa Putra University, Sukabumi 43155, Indonesia 2 Department of Agricultural Technology, Faculty of Agriculture, Universitas Riau, Pekanbaru 28293, Indonesia 3 Agricultural Engineering Study Program, IPB University, Bogor 16680, Indonesia 4 Department of Mechanical Engineering, Lhokseumawe State Polytechnic, Lhokseumawe 24301, Indonesia 5 Department of Agricultural Engineering, Faculty of Agriculture, Syiah Kuala University, Banda Aceh 23111, Indonesia Corresponding Author Email: nunik@nusaputra.ac.id https://doi.org/10.18280/mmep.100123 ABSTRACT Received: 24 October 2022 Accepted: 12 February 2023 The established assessment of post-harvest attributes, such as the age of harvesting day, requires destructive sampling that the availability of fruit of trees can often limit and is expensive. In contrast, non-destructive post-harvest attribute assessment utilizing the NIR data spectrum is fast and reliable, especially for mango. However, NIR spectral data frequently produce non-linearity with the reference dataset used. Therefore, this study conducted research on using NIR spectral data to classify the harvesting age of mango fruits using machine learning algorithms. A total of five supervised machine learning algorithms were explored to generate the classification model, including gradient boost (GB), k-nearest neighbor (k-NN), decision tree (DT), random forest (RF), and linear discriminant analysis (LDA). In this study, 237 NIR spectral data from mango fruits with Arumanis cultivars from orchard sites in the Garut district, West Java Province (Indonesia) were measured to determine the appropriate harvest time using NIR spectra 1000 to 2500 nm. The data sets were randomly divided into training and testing datasets, 80% and 20%, respectively. Hyperparameter optimization was performed using the GridSearchCV function from scikit-learn by observing the evaluation of the confusion matrix. Generally, all machine learning algorithms can show performance in classifying the harvest age of mango fruit based on NIR spectra data. Based on the accuracy evaluation matrix, the best machine learning algorithm arranged to classify the age of mango fruit harvest is DT>GB>LDA>RF>k-NN. Finally, predictions generated using the DT algorithm from more established machine learning algorithms as a training and testing set consistently yielded higher prediction accuracy in classification models. This study provides a framework for understanding the feasibility of machine learning algorithms on NIR data spectral to the accuracy of classification prediction of the harvesting age of mango. In addition, this study presents the importance of assessing the performance of the classification model using confusion metrics. Keywords: artificial intelligent, classification, decision tree, near-infrared, postharvest attributes 1. INTRODUCTION The quality of the fruit, which includes textural attributes and flavor traits, is essential to determine the customer's preference for mango. From farmers to packing plants, the evaluation of post-harvest fruit quality typically considers some vital characteristics, such as firmness, soluble solid content, and titratable acidity. According to reports, firmness is a critical element that affects customer acceptability, with the concentration of soluble solids and titratable acidity enhancing the consumer experience. Therefore, the firmness of climacteric fruits has strongly correlated with their moisture content and post-storage soluble solid content and is generally recognized as a reliable measure of fruit quality [1-4]. Classical post-harvest characteristic measurements are time-consuming, challenging, and harmful. Fruit options for destructive sampling are often restricted in mango farms. Farmers struggle to collect uniformly mature fruit in addition to the specified amount of fruit [5]. Furthermore, little is known about selection-specific markers of maturity, which can result in very varied samples when paired with the regular within-canopy heterogeneity in maturity. As a nondestructive alternative, NIR spectroscopy offers a respectable level of precision for assessing internal properties. Due to its simple, trustworthy, and affordable examination of postharvest features, nondestructive NIR-based prediction has been widely used in various fruits and vegetables. The NIR spectrum has attracted great interest in recent years, combined with machine learning and deep learning for data analysis. It has been proven to be a reliable predictor of the quality characteristics of different fruit crops, such as apples, mangoes, and pears [6-9]. Chemometrics is needed to extract and deconvolute detailed physical and chemical data to associate destructive observations with nondestructive NIR spectra. Chemometric applications in NIR spectroscopy frequently use partial least- Mathematical Modelling of Engineering Problems Vol. 10, No. 1, February, 2023, pp. 204-211 Journal homepage: http://iieta.org/journals/mmep 204