International Journal of Computer Applications (0975 - 8887) Volume 57 - No. 16, November 2012 Mango Fruit Quality Prediction using Associative Classification Rules Rattapol Pornprasit Postharvest Technology Research Institute Chiang Mai University, Chiang Mai, Thailand. Postharvest Technology Innovation Center Commission on Higher education, Bangkok, Thailand Juggapong Natwichai Department of Computer Engineering Faculty of Engineering, Chiang Mai University Chiang Mai, Thailand Bowonsak Srisungsittisunti Department of Computer Engineering Faculty of Engineering, Chiang Mai University Chiang Mai, Thailand ABSTRACT Near-infrared (NIR) spectroscopy is a non-destructive technique which can provide the quality measurement for agriculture prod- ucts. In this paper, we propose an approach to utilize the NIR spec- trum for mango fruits quality prediction. The prediction model is based on one of the most prominent machine learning approaches, associative classification. The associative classifiers are trained from the spectrum data of each mango fruit, and the chemical prop- erty represented fruit quality as the class label. When a classifier is to be applied to predict the quality, the spectrum of the mango fruits is measured, and the class label is determined by the classi- fication rules subsequently. Series of experiments were conducted under various parameter settings to evaluate the accuracy of the prediction. The results showed that the highest accuracy, the opti- mal performance, can be obtained when the number of boxes, the number of partitions of each spectrum for rule generation, was set at 10, and the minimum support threshold and the minimum con- fidence threshold were set at 1% and 50%, respectively. Based on the thorough experiments, a guideline for optimal parameter deter- mination is also proposed for the practitioners. General Terms: Classification, Prediction, Machine learning Keywords: Associative classification rules, Mango fruits, Near-infrared spectroscopyifx 1. INTRODUCTION The mango (Mangifera indica L.) is a tropical fruit, which there is a high demand in the world market. In 2010, Office of Agri- cultural Economics (OAE) of Thailand reported that, Thailand had exported 22 million tons of mangoes creating 505 million Thai Baht revenue. Furthermore, the number was increased to 703 million Thai Baht by 37.5 million tons of mangoes in 2011. Although, the mango fruits are highly demanded but the qual- ity classification is still an important issue since the customers cannot taste the ripeness fruit. Furthermore, the precise quality- classification will need the mango fruits to be destroyed by the chemical testing [7, 17]. Near-infrared (NIR) spectroscopy is a non-destructive technique which can provide the quality measurement for agricultural prod- ucts. NIR spectroscopy was first used in agricultural applications by Norris to measure moisture in grains [13]. Since then it has been used for rapid analysis of mainly moisture, protein and fat content of a wide variety of agricultural and food products [4, 6]. For mango fruits, it was first reported that NIR was used by Guthrie and Walse to assess dry matter (DM) [7]. Subsequently, the NIR had been applied for mango fruits in various ways. Typically, the NIR reflectance information in the spectra from a sample fruit is used to predict the chemical composition of such sample by extracting the relevant information from many over- lapping peaks. Then, the predicted chemical composition is in- terpreted as the quality. Before the quality measurement by the NIR can be applied, the system has to be calibrated for the ac- curacy result. In general, the calibration can have the difficulties which are caused by the complex nature of the NIR reflectance spectra. In which each of the interesting spectrum is almost com- pletely overlapping by the others. The calibrated models require routine checking for improving the accuracy and reducing the estimation errors [14]. Generally, statistical analysis is used to analyze the spectrum data for discovering relationship between the spectrum and the chemical properties. After the particular spectrum is identified, the analysis is preceded. Multiple linear regressions (MLR), principle component regression (PCA), and partial least squares (PLS) are often applied for the calibration [11]. However, it might not be appropriated in practices, since the sample data can be updated. Or, the calibrated prediction model can be affected by the changing environments, and it can cause the error in the analysis. An approach to build the prediction model, which can avoid the mentioned problems, is the machine learning. The prediction model update caused by the additional samples can be done with- out re-learn the whole samples. Additionally, more samples can help the prediction models more robust subjected to the change of the environment. In this paper, we propose to an approach to apply one of the most prominent machine learning approaches, associative clas- sifiers [10] for the mango fruit quality prediction using NIR. Such approach is based on Apriori algorithm [2]. The associative classifiers are rule-based classification derived from the training dataset. Each rule in the prediction model has to satisfy the pre- defined minimum support and minimum confidence constraints. In this work, the prediction models, or the classifiers, are built by firstly determining the frequent items found in the NIR spectra. In which an item in the context of the NIR quality measurement 20