Please cite this article in press as: J. Tobin, et al., Untargeted analysis of chromatographic data for green and fermented rooibos: Problem with size effect removal, J. Chromatogr. A (2017), https://doi.org/10.1016/j.chroma.2017.10.024 ARTICLE IN PRESS G Model CHROMA-358929; No. of Pages 7 Journal of Chromatography A, xxx (2017) xxx–xxx Contents lists available at ScienceDirect Journal of Chromatography A journal homepage: www.elsevier.com/locate/chroma Full length article Untargeted analysis of chromatographic data for green and fermented rooibos: Problem with size effect removal Jade Tobin a,b , Jan Walach c , Dalene de Beer a,b , Paul J. Williams b , Peter Filzmoser c , Beata Walczak d, a Plant Bioactives Group, Post-Harvest and Wine Technology Division, Agricultural Research Council (ARC), Infruitec-Nietvoorbij, Private Bag X5026, Stellenbosch, 7599, South Africa b Department of Food Science, Stellenbosch University, Private Bag X1, Matieland, Stellenbosch, South Africa c Institute of Statistics and Mathematical Methods in Economics, Vienna University of Technology, Vienna, Austria d University of Silesia, Institute of Chemistry, Szkolna 9, 400-006, Katowice, Poland a r t i c l e i n f o Article history: Received 14 July 2017 Received in revised form 2 October 2017 Accepted 8 October 2017 Available online xxx Keywords: Multivariate analysis of variance Pre-processing Target projection Pairwise log-ratio Biomarkers identification Rooibos tea fermentation a b s t r a c t While analyzing chromatographic data, it is necessary to preprocess it properly before exploration and/or supervised modeling. To make chromatographic signals comparable, it is crucial to remove the scaling effect, caused by differences in overall sample concentrations. One of the efficient methods of signal scaling is Probabilistic Quotient Normalization (PQN) [1]. However, it can be applied only to data for which the majority of features do not vary systematically among the studied classes of signals. When studying the influence of the traditional “fermentation” (oxidation) process on the concentration of 56 individual peaks detected in rooibos plant material, this assumption is not fulfilled. In this case, the only possible solution is the analysis of pairwise log-ratios, which are not influenced by the scaling constant. To estimate significant features, i.e., peaks differentiating the studied classes of samples (green and fermented rooibos plant material), we propose the application of rPLR (robust pair-wise log-ratios) as proposed by Walach et al. [2]. It allows for fast computation and identification of the significant features in terms of original variables (peaks) which is problematic, while working with the unfolded pair-wise log ratios. As demonstrated, it can be applied to designed data sets and in the case of contaminated data, it allows proper conclusions. © 2017 Elsevier B.V. All rights reserved. 1. Introduction Rooibos herbal tea, made from the indigenous South African fynbos plant Aspalathus linearis (Burm.f.) R.Dahlgren, has gained tremendous popularity on the global market. It is mainly pro- duced in the “fermented” (oxidised) form with only a small amount of green (unoxidised) herbal tea produced. The health-promoting properties of rooibos, e.g. antioxidant, anti-cancer, antidiabetic, hepatoprotective and anti-inflammatory activities to name a few (reviewed by Joubert et al.; Joubert and de Beer) [3,4], are mainly associated with its unique phenolic composition. During tradi- tional processing of rooibos herbal tea, the fermentation step is essential for developing the sought-after flavour and red-brown color of the tea. However, oxidation of phenolic compounds also occur with large reduction in especially aspalathin content (Wal- Corresponding author. E-mail address: beata.walczak@us.edu.pl (B. Walczak). ters et al.) [5]. The phenolic oxidation reactions occurring during fermentation is still poorly understood. Chemometric analysis of chromatographic data from green and fermented rooibos plant material can provide information about which compounds are involved in oxidative reactions during fermentation. Prior to data analysis, chromatographic fingerprints have to be preprocessed to eliminate all undesired signal components, such as baseline and noise, and properly aligned to the selected target. Additionally, to make them comparable, it is necessary to normalize them (in order to remove the ‘size effect’), and to transform the studied fea- tures to stabilize the data variance. All these steps determine proper identification of the changing concentration of the plant material components during the fermentation process. In our previous study [6] limited to 16 known standards only, it was proved that fermen- tation process has a statistically significant influence on the extract composition. However, when working with the standards, it was possible to estimate their concentrations in the studied extracts. When working with the entire fingerprints (untargeted analysis), we have to work with the peak areas instead of concentrations, https://doi.org/10.1016/j.chroma.2017.10.024 0021-9673/© 2017 Elsevier B.V. All rights reserved.