Journal of Chromatography A, 1216 (2009) 8404–8420 Contents lists available at ScienceDirect Journal of Chromatography A journal homepage: www.elsevier.com/locate/chroma Gas chromatographic quantitative structure–retention relationships of trimethylsilylated anabolic androgenic steroids by multiple linear regression and partial least squares A.G. Fragkaki a,b , A. Tsantili-Kakoulidou c , Y.S. Angelis a , M. Koupparis b , C. Georgakopoulos a, a Doping Control Laboratory of Athens, Olympic Athletic Center of Athens “Spyros Louis”, Kifisias 37, 15123 Maroussi, Greece b Laboratory of Analytical Chemistry, Department of Chemistry, University of Athens, Panepistimioupolis, Zografou, 15771 Athens, Greece c Department of Pharmaceutical Chemistry, School of Pharmacy, University of Athens, Panepistimioupolis, Zografou, 15771 Athens, Greece article info Article history: Received 25 May 2009 Received in revised form 8 September 2009 Accepted 25 September 2009 Available online 2 October 2009 Keywords: Anabolic androgenic steroids Doping control Quantitative structure–retention relationships Principal component analysis Multiple linear regression Partial least squares abstract A quantitative structure–retention relationship (QSRR) study has been performed to correlate relative retention times (RRTs) of trimethylsilylated (TMS) anabolic androgenic steroids (AAS) with their molec- ular characteristics, encoded by the respective descriptors, for the prediction of RRTs of novel molecules, using gas chromatography time-of-flight mass spectrometry (GC-TOF-MS). The elucidation of similari- ties and dissimilarities among the data structures was carried out using principal component analysis (PCA). Successful models were established using multiple linear regression (MLR) and partial least squares (PLS) techniques as a function of topological, three-dimensional (3D) and physicochemical descriptors. The models are useful for the estimation of RRTs of designer steroids for which no analytical data is available. © 2009 Elsevier B.V. All rights reserved. 1. Introduction Quantitative structure–retention relationships (QSRRs) repre- sent a powerful technique for relating the gas chromatographic retention parameters of groups of analytes and their descriptors, which are quantities encoding the structural characteristics [1–3]. The most commonly used retention parameters in gas chromatog- raphy are the retention times (RTs), the relative retention times (RRTs), the Kováts retention indices and the logarithms of retention volumes of analytes [4]. The QSRR approach can be applied to identify the most useful structural descriptors, to predict retention for a new analyte, to gain insight into the molecular mechanism of chromatographic separation, to quantitatively compare sepa- ration properties of individual types of chromatographic columns and to evaluate properties other than chromatographic, such as lipophilicity. The construction of predictive QSRR models involves three steps [5]: Corresponding author. Tel.: +30 210 6834567; fax: +30 210 6834021. E-mail address: oaka@ath.forthnet.gr (C. Georgakopoulos). (a) the acquisition of a sufficiently large set of retention data of ana- lytes covering possible structural diversities within a defined group of substances, (b) the calculation of structural descriptors of the analytes, such as topological, three-dimensional (3D; geometrical and elec- tronic) and physicochemical, (c) the correlation of the retention data (dependent variable) with the calculated descriptors (independent variables) using appro- priate statistical methods. Multiple linear regression (MLR) is one of the most frequently applied methods in generating QSRR models [6]. The inability of MLR to treat intercorrelated variables and missing data, as well as the fact that it can consider only one dependent variable in each model can be overcome through partial least squares technique (PLS) which is also widely used in QSRR studies. Unlike MLR, PLS can analyze strongly collinear data, reducing the high dimensional data matrix to a much smaller and interpretable set of principal compo- nents or latent variables. Moreover, principal component analysis (PCA) is useful in providing a data overview [7]. Anabolic androgenic steroids (AAS) are included in the List of prohibited substances of the World Anti-Doping Agency (WADA) [8]. Chemically modified steroids, otherwise known as designer 0021-9673/$ – see front matter © 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.chroma.2009.09.066