Journal of Chromatography A, 1192 (2008) 157–165 Contents lists available at ScienceDirect Journal of Chromatography A journal homepage: www.elsevier.com/locate/chroma No-alignment-strategies for exploring a set of two-way data tables obtained from capillary electrophoresis–mass spectrometry M. Daszykowski a , R. Danielsson b , B. Walczak a, a Department of Chemometrics, Institute of Chemistry, Silesian University, 9 Szkolna Street, 40-006 Katowice, Poland b Department of Physical and Analytical Chemistry, Analytical Chemistry, Uppsala University, P.O. Box 599, SE-751 24 Uppsala, Sweden article info Article history: Received 25 December 2007 Received in revised form 8 March 2008 Accepted 11 March 2008 Available online 15 March 2008 Keywords: Comparing data tables Hyphenated techniques Two-dimensional fingerprints Alignment Warping Chemometrics abstract Hyphenated techniques such as capillary electrophoresis–mass spectrometry (CE–MS) or high- performance liquid chromatography with diode array detection (HPLC–DAD), etc., are known to produce a huge amount of data since each sample is characterized by a two-way data table. In this paper different ways of obtaining sample-related information from a set of such tables are discussed. Working with orig- inal data requires alignment techniques due to time shifts caused by unavoidable variations in separation conditions. Other pre-processing techniques have been suggested to facilitate comparison among samples without prior peak alignment, for example, ‘binning’ and/or ‘blurring’ the data along the time dimension. All these techniques, however, require optimization of some parameters, and in this paper an alternative parameter-free method is proposed. The individual data tables (X) are represented as Gram matrices (XX T ), where the summation is taken over the time dimension. Hence the possible variations in time scale are eliminated, while the time information is at least partly preserved by the correlation structure between the detection channels. For comparison among samples, a similarity matrix is constructed and explored by principal component analysis and hierarchical clustering. The Gram matrix approach was tested and compared to some other methods using ‘binned’ and ‘blurred’ data for a data set with CE–MS runs on urine samples. In addition to data exploration by principal component analysis and hierarchical clustering, a discriminant partial least squares model was constructed to discriminate between the samples that were taken with and without the prior intake of a drug. The result showed that the proposed method is at least as good as the others with respect to cluster identification and class prediction. A distinct advantage is that there is no need for parameter optimization, while a potential drawback is the large size of the Gram matrices for data with high mass resolution. © 2008 Elsevier B.V. All rights reserved. 1. Introduction Nowadays, hyphenated chromatographic techniques, such as high-performance liquid chromatography–mass spectrome- try (LC–MS) are frequently used in proteomic and metabolomic studies. They provide as output a data table of each individual chro- matographic run in which a sample is analyzed. Therefore, the data collected in the analysis of several samples can be viewed as a three- way array, e.g. retention time × multiple detector responses × samples. A major advantage of the hyphenated techniques over the chro- matographic methods equipped with monochannel detectors is the possibility to overcome problems with co-elution and to verify the purity of chromatographic peaks [1]. The hyphenated chromato- graphic techniques are often the methods of choice in order to Corresponding author. Tel.: +48 32 3592115; fax: +48 32 2599978. E-mail address: beata@us.edu.pl (B. Walczak). obtain fingerprints of complex mixtures like biofluids (e.g. urine and serum samples), environmental samples, peptides, food sam- ples, etc., where the goal is to find differences among samples and to identify components responsible for these differences. More- over, the obtained data are very complex and their chemometric exploration is still an ongoing challenge [2]. Usually the collected three-way data are matricized (unfolded), for example, like samples × (retention time × multiple detector responses) and further explored with unsupervised chemometric techniques designed to process two-way data. The applications of principal component analysis (PCA) [3] and different cluster- ing techniques [4] seem to be dominant in the literature. With PCA, data visualization and a study of similarities among samples are possible by projecting samples onto selected pairs of principal components that describe the majority of the data variance. Addi- tionally, PCA helps to analyze relationships among the explanatory variables and their contributions to individual principal compo- nents. In addition to PCA, the hierarchical clustering approaches 0021-9673/$ – see front matter © 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.chroma.2008.03.027