Regular article Combined data mining/NIR spectroscopy for purity assessment of lime juice Sahameh Shafiee, Saeid Minaei ⇑ Biosystems Engineering Department, Tarbiat Modares University, Tehran, Iran highlights Structural differences in NIR specral data can be utilized to distinguish natural and synthetic lemon juices. Feature reduction did not lead to more accurate results when data mining. The SVM classifiers show the best performance for sample discrimination using NIR spectroscopy. article info Article history: Received 12 October 2017 Accepted 27 April 2018 Available online 30 April 2018 Keywords: NIR spectroscopy Genetic algorithm Support vector machine Random forest Radial basis function network abstract This paper reports the data mining study on the NIR spectrum of lime juice samples to determine their purity (natural or synthetic). NIR spectra for 72 pure and synthetic lime juice samples were recorded in reflectance mode. Sample outliers were removed using PCA analysis. Different data mining techniques for feature selection (Genetic Algorithm (GA)) and classification (including the radial basis function (RBF) network, Support Vector Machine (SVM), and Random Forest (RF) tree) were employed. Based on the results, SVM proved to be the most accurate classifier as it achieved the highest accuracy (97%) using the raw spectrum information. The classifier accuracy dropped to 93% when selected feature vector by GA search method was applied as classifier input. It can be concluded that some relevant features which produce good performance with the SVM classifier are removed by feature selection. Also, reduced spec- tra using PCA do not show acceptable performance (total accuracy of 66% by RBFNN), which indicates that dimensional reduction methods such as PCA do not always lead to more accurate results. These findings demonstrate the potential of data mining combination with near-infrared spectroscopy for monitoring lime juice quality in terms of natural or synthetic nature. Ó 2018 Published by Elsevier B.V. 1. Introduction Fruit juice Adulteration for economic gain is a well-established malpractice. The major components of fruit juice are water, sugars, and organic acids with lesser amounts of amino acids, vitamins and phenolic compounds [1–4]. Lime juice is a popular fruit juice used in cooking and table-top consumption. The considerable demand for lime juice and recent weather conditions have led to variability in the supply of fresh limes, giving unscrupulous producers the incentive to dilute their products with water and subsequent addi- tion of citric acid to compensate for flavor loss. Some producers go as far as preparing a completely synthetic product resulting in health hazards for the consumers. Incidence of lime juice fraud is on the rise and with increased globalization, any single adulter- ation event can affect a larger and wider population than ever [3]. Titratable acidity and soluble solids content are the two major quality indicators for fruit juice in the food industry [3]. Total titratable acidity includes all the acidic substances in juice that react with sodium hydroxide [5]. In most cases, this value repre- sents the organic acid content of a given juice. Organic acids are important constituents of the soluble solids of citrus juice, and in lemons and limes they become the principal soluble constituents. Citric acid is the characteristic and predominant acid in citrus fruits, and it is accompanied by malic acid and some other minor acids [1]. Titratable acidity and Brix value may be used as indica- tors to detect whether the juice has been diluted with too much water, which is the simplest adulteration practice. However, since these values are easy to measure, fraud performers commonly dilute fruit juice with water containing sugar and citric acid. Com- prehensive analytical approaches for detecting chemical composi- tion changes associated with adulteration of fruit juice are therefore needed [6]. Some methods have been utilized to https://doi.org/10.1016/j.infrared.2018.04.012 1350-4495/Ó 2018 Published by Elsevier B.V. ⇑ Corresponding author. E-mail addresses: s.shafiee80@gmail.com (S. Shafiee), Minaee@modares.ac.ir (S. Minaei). Infrared Physics & Technology 91 (2018) 193–199 Contents lists available at ScienceDirect Infrared Physics & Technology journal homepage: www.elsevier.com/locate/infrared