Classification Tools for Carotenoid Content
Estimation in Manihot esculenta via Metabolomics
and Machine Learning
Rodolfo Moresco
1(
✉
)
, Telma Afonso
3
, Virgílio G. Uarrota
1
, Bruno Bachiega Navarro
1
,
Eduardo da C. Nunes
2
, Miguel Rocha
3
, and Marcelo Maraschin
1
1
Plant Morphogenesis and Biochemistry Laboratory, Federal University of Santa Catarina,
Florianopolis, Brazil
rodolfo_moresco@yahoo.com.br
2
Santa Catarina State Agricultural Research and Rural Extension Agency (EPAGRI),
Experimental Station of Urussanga, Urussanga, Brazil
3
Centre Biological Engineering, School of Engineering, University of Minho, Braga, Portugal
Abstract. Cassava genotypes (Manihot esculenta Crantz) with high pro-vitamin
A activity have been identified as a strategy to reduce the prevalence of deficiency
of this vitamin. The color variability of cassava roots, which can vary from white
to red, is related to the presence of several carotenoid pigments. The present study
has shown how CIELAB color measurement on cassava roots tissue can be used
as a non-destructive and very fast technique to quantify the levels of carotenoids
in cassava root samples, avoiding the use of more expensive analytical techniques
for compound quantification, such as UV-visible spectrophotometry and the
HPLC. For this, we used machine learning techniques, associating the colori‐
metric data (CIELAB) with the data obtained by UV-vis and HPLC, to obtain
models of prediction of carotenoids for this type of biomass. Best values of R
2
(above 90%) were observed for the predictive variable TCC determined by UV-
vis spectrophotometry. When we tested the machine learning models using the
CIELAB values as inputs, for the total carotenoids contents quantified by HPLC,
the Partial Least Squares (PLS), Support Vector Machines, and Elastic Net
models presented the best values of R
2
(above 40%) and Root-Mean-Square Error
(RMSE). For the carotenoid quantification by UV-vis spectrophotometry, R
2
(around 60%) and RMSE values (around 6.5) are more satisfactory. Ridge regres‐
sion and Elastic Network showed the best results. It can be concluded that the use
colorimetric technique (CIELAB) associated with UV-vis/HPLC and statistical
techniques of prognostic analysis through machine learning can predict the
content of total carotenoids in these samples, with good precision and accuracy.
Keywords: Chemometrics · Descriptive models · Machine learning · Cassava
genotypes · Carotenoids · HPLC · UV-vis
© Springer International Publishing AG 2017
F. Fdez-Riverola et al. (eds.), 11th International Conference on Practical
Applications of Computational Biology & Bioinformatics, Advances in Intelligent
Systems and Computing 616, DOI 10.1007/978-3-319-60816-7_34