Chemical Fingerprinting of Petroleum Biomarkers Using Time Warping and PCA JAN H. CHRISTENSEN,* ,†,‡ GIORGIO TOMASI, # AND ASGER B. HANSEN ‡ Department of Environmental Chemistry and Microbiology, National Environmental Research Institute, Frederiksborgvej 399, P.O. Box 358, 4000 Roskilde, Denmark, Department of Life Sciences and Chemistry, Roskilde University, Universitesvej 1, P.O. Box 260, 4000 Roskilde, Denmark, and Department of Food Science, The Royal Veterinary & Agricultural University, Frederiksberg C, Denmark A new method for chemical fingerprinting of petroleum biomakers is described. The method consists of GC-MS analysis, preprocessing of GC-MS chromatograms, and principal component analysis (PCA) of selected regions. The preprocessing consists of baseline removal by derivatization, normalization, and alignment using correlation optimized warping. The method was applied to chromato- grams of m/z 217 (tricyclic and tetracyclic steranes) of oil spill samples and source oils. Oil spill samples collected from the coastal environment in the weeks after the Baltic Carrier oil spill were clustered in principal components 1 to 4 with oil samples from the tank of the Baltic Carrier (source oil). The discriminative power of PCA was enhanced by deselecting the most uncertain variables or scaling them according to their uncertainty, using a weighted least squares criterion. The four principal components were interpreted as follows: boiling point range (PC1), clay content (PC2), carbon number distribution of sterols in the source rock (PC3), and thermal maturity of the oil (PC4). In summary, the method allows for analyses of chromatograms using a fast and objective procedure and with more comprehensive data usage compared to other fingerprinting methods. Introduction Chemical fingerprinting is a collection of techniques that trace the origin of a sample (e.g. pollutant) based on its chemical composition. In forensic oil spill identification and in geochemistry, petroleum biomarkers are widely used for this purpose (1-3). Oil contains a large number of biomarkers, of which terpanes and steranes are among the most abundant in crude oils. The relative content of biomarker compounds in source rocks, and hence crude oils, depends on source, maturation, and in-reservoir weathering and biodegradation processes (2). Furthermore, these compounds are recalcitrant when released to the environment following oil spills. Thus, they are useful for oil/oil and oil/source rock correlation purposes (4-6). Gas chromatography-mass spectrometry (GC-MS) is the standard method for the analysis of petroleum biomarkers (1-3). The associated chromatograms contain a considerable amount of information relevant to chemical fingerprinting but can be complex with peaks that coelute. Consequently, standard peak quantification procedures are associated with large variability and often fail to extract high quality data. Peak separation can be improved by using longer capillary columns or more sophisticated mass spectrometry methods; e.g. high-resolution GC-MS and GC-MS-MS have been found particularly useful for improved resolution and identification of biomarker compounds (7). However, such instrumentation is not widespread in the scientific community, and it is cumbersome to identify and quantify large numbers of peaks as a means to compare oil spill samples and source oils. Consequently, some chemical information is typically ig- nored, and chemical fingerprinting focuses on few descriptive variables, e.g. diagnostic ratios (3, 4, 8). Moreover, chro- matographic data preprocessing, which includes peak iden- tification, quantification, and quality control, is time- consuming and often requires subjective decisions. Chemometric methods such as principal component analysis (PCA) provide useful tools for more extensive analyses of chromatographic data (5, 9, 10). However, when applied to quantitative data these methods are still affected by the implications described in the previous paragraph. Thus, our primary aim was to develop an objective method for chemical fingerprinting by avoiding initial peak identi- fication, and quantification, and instead performing PCA on the digitized chromatograms. The most severe impediment to such an approach is the inevitable retention time shift caused largely by deterioration of the capillary column (11). The correlation optimized warping algorithm (COW) (12) has been successfully employed to realign chromatograms from GC-FID (13), HPLC (12), and LC-MS (14); here, it is combined with PCA into a method for chemical fingerprinting of petroleum biomarkers. PCA allows for chemical inter- pretation of the results, which is needed for confirming the observed correlation of oil samples. The method consists of two parts: preprocessing and chemometric data analysis. Preprocessing comprises de- rivatization, normalization, and alignment (which includes selection of a target chromatogram, optimization of the warping parameters, and warping of the sample chromato- grams). In the chemometric analysis, the data are first divided into a calibration set of source oils, a set of reference oils (the ‘reference set’), and a test set containing spill samples; then, a principal component model is fitted to the calibration set and optimized on the basis of the reference set; finally the test set is projected on the model and the spill samples are matched to the source oils. The method was applied to 101 chromatograms of m/z 217, which includes tricyclic and tetracyclic steranes (15) and other compounds yet unidentified (Figure 1). Tetracyclic steranes have been used frequently for chemical fingerprint- ing (4, 5), but many peaks coelute and hence only a fraction of these is commonly employed for forensic oil spill identification (3, 4, 8). Methods and Materials Experimental. The oil samples used in the analysis were all part of the oil database at the forensic oil spill laboratory, National Environmental Research Institute, DK. The database consists of crude oils, refined products, oil mixtures, and spill samples from oil spill cases during the last 10 years. A subgroup of these oils was dissolved separately in dichlo- * Corresponding author phone: +45-46301200; fax: +45-46301114; e-mail: jch@dmu.dk. † National Environmental Research Institute. ‡ Roskilde University. # The Royal Veterinary & Agricultural University. Environ. Sci. Technol. 2005, 39, 255-260 10.1021/es049832d CCC: $30.25 2005 American Chemical Society VOL. 39, NO. 1, 2005 / ENVIRONMENTAL SCIENCE & TECHNOLOGY 9 255 Published on Web 11/19/2004