Please cite this article in press as: S. Shrestha, et al., Single seed near-infrared hyperspectral imaging in determining tomato (Solanum lycopersicum L.) seed quality in association with multivariate data analysis, Sens. Actuators B: Chem. (2016), http://dx.doi.org/10.1016/j.snb.2016.08.170 ARTICLE IN PRESS G Model SNB-20851; No. of Pages 8 Sensors and Actuators B xxx (2016) xxx–xxx Contents lists available at ScienceDirect Sensors and Actuators B: Chemical journal homepage: www.elsevier.com/locate/snb Single seed near-infrared hyperspectral imaging in determining tomato (Solanum lycopersicum L.) seed quality in association with multivariate data analysis Santosh Shrestha a , Matej Knapiˇ c b , Uroˇ s ˇ Zibrat b , Lise Christina Deleuran a , René Gislum a, a Aarhus University, Department of Agroecology, Slagelse, 4200, Denmark b Agricultural Institute of Slovenia, Ljubljana, Slovenia a r t i c l e i n f o Article history: Received 26 February 2016 Received in revised form 12 August 2016 Accepted 29 August 2016 Available online xxx Keywords: PCA PLS-DA Chemometrics Seed quality Variety identification and seed viability a b s t r a c t Near-infrared (NIR) hyperspectral imaging was explored as a rapid and non-destructive method of inves- tigating seed quality parameters such as seed viability and variation in tomato seed lots. The seed lots differed with year of production and variety. Four tomato varieties: Cal J, Monprecus, NCL and Chiuri from 2013, 2014 and 2015 were used in the study. The extracted NIR hyperspectral data from 975 to 2500 nm were analysed by principal component analysis (PCA) and partial least squares- discriminant analysis (PLS-DA). No distinct patterns of separation between viable and non-viable tomato seeds were revealed by the PCA. Our findings showed a pattern of separation in the tomato seed lots due to production years and varieties. The PLS-DA showed the ability to predict with 100 percent accuracy for varietal class membership when only the seeds of a single harvest year were included in the model. The accuracy from PLS-DA on pooled samples (all seeds from all varieties) predicted varietal class membership in the range from 34 to 88 percent. High variation in the seed lots could have caused high variation in the predicted varietal class membership. The NIR regions with chemical information from C H, N H and O H had influence on the PCA and PLS-DA models. The study presents the prospects of using NIR hyperspectral imaging in varietal identification studies of tomato seeds though we recommend a thorough validation of models. © 2016 Elsevier B.V. All rights reserved. 1. Introduction Tomato (Solanum lycopersicum L.) is an economically important horticultural crops and is known for its diverse consumption pat- terns such as fresh in salads and processed in for example ketchups and paste. Over the past decade, worldwide production of tomato has increased by nearly 40% [1]. This has been achieved through the intensive breeding programmes targeted to develop new vari- eties with high yielding potentials and introgression of desirable flavour and texture traits required to meet the global demand [2]. A large number of tomato varieties are available worldwide, which outnumbers any other vegetable crops [3]. The modern tomato varieties have a narrow genetic base; as a consequence there is reduced phenotypic variation among the varieties [4]. Owing to this, it has become difficult to measure the distinctness, uniformity Corresponding author. E-mail address: rg@agro.au.dk (R. Gislum). and stability (DUS) traits of newly submitted variety required for registration to grant plant variety protection (PVP) [5]. In addition, the success of any high yielding variety depends on maintaining its varietal purity and holds purity, which is of high importance in the seed trade. Therefore, assessing the purity of the commercial vari- eties is essential for any seed company before the varieties reach farmers’ fields [4]. The biochemical markers based on isozymes and proteins [6,7] and DNA-based molecular markers [4,5,8] are often employed to investigate the varietal identity and genetic purity. Even though these methods are very precise, they are also time consuming and destructive. Many tomato varieties have irregular fruit maturity due to con- tinuous and non-uniform flowering, which makes it difficult to determine the optimum time for seed harvest, leading to a mixture of seeds of varying maturity [9,10]. The position of the fruits har- vested from the mother plant is also found to add variation to the seed quality [11]. Therefore, the final seed lot contains a mixture of seeds with a varying degree of germination ability. The variation in germination ability within a seed lot results in uneven or poor plant http://dx.doi.org/10.1016/j.snb.2016.08.170 0925-4005/© 2016 Elsevier B.V. All rights reserved.