Variable Selection Applied to the Development of a Robust Method for the Quantification of Coffee Blends Using Mid Infrared Spectroscopy Camila Assis 1 & Leandro S. Oliveira 2 & Marcelo M. Sena 1,3 Received: 7 June 2017 /Accepted: 20 August 2017 # Springer Science+Business Media, LLC 2017 Abstract This paper combined attenuated total reflectance Fourier transform infrared spectroscopy (ATR-FTIR), multi- variate calibration with partial least squares (PLS), and differ- ent variable selection methods for the development of models to determine Robusta-Arabica coffee blends in the analytical range from 0.0 to 33.0% w/w . Ground samples of different origins were roasted at three different levels: light, medium, and dark. Specific models were built for each roasting level, and a robust model was also obtained including all the sam- ples. Mid infrared spectra were recorded in the wavenumber range between 4000 and 800 cm -1 for the 120 samples used in the models. Four variable selection methods were tested: ge- netic algorithm (GA), ordered predictors selection (OPS), suc- cessive projections algorithm (SPA), and interval PLS (iPLS). The best results were obtained using GA and OPS, decreasing root mean square errors of prediction (RMSEP) in 44–68% as compared to full spectra models. The best robust model was obtained with OPS, providing RMSEP of 1.8% w/w. The number of selected variable in the optimized models varied from 6.5 to 17.0% of the total number of original variables. This demonstrated the importance of selecting a limited num- ber of wavenumbers richer in information specifically related to the analytes. All the methods were validated by estimating appropriate figures of merit and considered accurate, linear, sensitive, and unbiased. Keywords Coffee blends . FTIR . Multivariate calibration . PLS regression . Variable selection . Food authentication Introduction Coffee is one of the most popular beverages in the world and has one of the greatest socio-economic impacts when com- pared with the different activities involved in the agricultural trade. Thus, coffee can be classified as a large global com- modity (Grinshpun 2014). Such popularity is mainly due to its beneficial physiological effects on health, good taste, intense flavor, and attractive aroma (Ludwig et al. 2014). Recent data indicate that global coffee exports amounted to 9.13 million bags in October 2016, with 0.9% estimated increase in global coffee production in 2015/2016 as compared to 2014/2015 (ICO 2016), showing that this market has a great growth trend. In this scenario, Brazil has a prominent role, being the world’ s largest producer, responsible for about 30% of the world’ s production. In general terms, the generic name Coffea has approximate- ly 70 species (Ludwig et al. 2014). From an economic point of view, the two most important species grown in the world are Arabica (Coffea arabica) and Robusta (Coffea canephora). Arabica coffee accounts for approximately 64% of the world production, with the remaining 36% derived from Robusta coffee (Barbin et al. 2014; Damatta et al. 2007). Both species differ not only in relation to their botanical characteristics and chemical composition, but also in terms of commercial value, with Arabica coffees presenting 20–25% higher market prices. Arabica coffee is originally from eastern Ethiopia and culti- vated only in areas above 800 m of altitude and with mild * Marcelo M. Sena marcsen@ufmg.br 1 Departamento de Química, ICEx, Universidade Federal de Minas Gerais, Belo Horizonte, MG 31270-901, Brazil 2 Departamento de Engenharia Mecânica, Escola de Engenharia, Universidade Federal de Minas Gerais, Belo Horizonte, MG 31270-901, Brazil 3 Instituto Nacional de Ciência e Tecnologia em Bioanalítica, Campinas, SP 13083-970, Brazil Food Anal. Methods DOI 10.1007/s12161-017-1027-7