Tile-based Fisher-ratio software for improved feature selection analysis of comprehensive two-dimensional gas chromatographytime-of-ight mass spectrometry data Luke C. Marney a , W. Christopher Siegler a,1 , Brendon A. Parsons a , Jamin C. Hoggard a , Bob W. Wright b , Robert E. Synovec a,n a Department of Chemistry, University of Washington, P.O. Box 351700, Seattle 98198, WA, USA b Pacic Northwest National Laboratory, Battelle Boulevard, P.O. Box 999, Richland 99352, WA, USA article info Article history: Received 13 March 2013 Received in revised form 18 June 2013 Accepted 21 June 2013 Available online 28 June 2013 Keywords: Fisher ratio Comprehensive two-dimensional gas chromatography Time-of-ight mass spectrometry Chemometrics Feature selection abstract Comprehensive two-dimensional (2D) gas chromatography coupled with time-of-ight mass spectro- metry (GC GCTOFMS) is a highly capable instrumental platform that produces complex and information-rich multi-dimensional chemical data. The data can be initially overwhelming, especially when many samples (of various sample classes) are analyzed with multiple injections for each sample. Thus, the data must be analyzed in such a way as to extract the most meaningful information. The pixel- based and peak table-based Fisher ratio algorithmic approaches have been used successfully in the past to reduce the multi-dimensional data down to those chemical compounds that are changing between the sample classes relative to those that are not changing (i.e., chemical feature selection). We report on the initial development of a computationally fast novel tile-based Fisher-ratio software that addresses the challenges due to 2D retention time misalignment without explicitly aligning the data, which is often a shortcoming for both pixel-based and peak table-based algorithmic approaches. Concurrently, the tile- based Fisher-ratio algorithm signicantly improves the sensitivity contrast of true positives against a background of potential false positives and noise. In this study, eight compounds, plus one internal standard, were spiked into diesel at various concentrations. The tile-based F-ratio algorithmic approach was able to discoverall spiked analytes, within the complex diesel sample matrix with thousands of potential false positives, in each possible concentration comparison, even at the lowest absolute spiked analyte concentration ratio of 1.06, the ratio between the concentrations in the spiked diesel sample to the native concentration in diesel. & 2013 Elsevier B.V. All rights reserved. 1. Introduction Multi-dimensional chromatographic instrumentation produces information-rich, and chemically complex multi-dimensional data containing meaningful chemical signals, often buried in a back- ground of less meaningful chemical signal and noise. Experiments can be designed to analyze the similarities and differences between multiple injections of different samples, producing a data set of even higher dimensionality. With the aid of computer software, scientists need to be able to quickly, easily and compre- hensively analyze multi-dimensional data sets, so important analytes and/or chemical ngerprints can be gleaned during discovery-based experimentation. Comprehensive two-dimensional (2D) gas chromatography coupled with time-of-ight mass spectro- metry (GC GCTOFMS) is a prominent multi-dimensional separa- tion technique that has been used extensively for discovery-based experimentation, especially when chemical species of interest are sufciently volatile or amenable to derivatization [18]. To address the challenges, chemometric software for analyzing GC GCTOFMS data, as well as other multi-dimensional separation techniques, are available and continue to be developed [9,10]. For discovery-based experimentation with GC GCTOFMS, the 2D misalignment of peaks across different samples makes non-targeted analysis difcult [11]. Some alignment algorithms have been developed for point by pointpixel-level data [1114], while others have been developed for peak table-based data. Current use of these algorithms has been recently reviewed [10]. Briey, data warping and interpolation are used to stretch and compress data in order to objectively optimize the match between analyte peaks in a targetGC GCTOFMS separation and the analyte peaks in a sampleseparation. The application of 2D Contents lists available at SciVerse ScienceDirect journal homepage: www.elsevier.com/locate/talanta Talanta 0039-9140/$ - see front matter & 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.talanta.2013.06.038 n Corresponding author. Tel.: +1 206 685 2328; fax: +1 206 685 8665. E-mail address: synovec@chem.washington.edu (R.E. Synovec). 1 Current address: Dow Chemical Corporation, 2301 North Brazosport Boulevard, Freeport 77541, TX, USA. Talanta 115 (2013) 887895