Journal of Chromatography A, 1192 (2008) 157–165
Contents lists available at ScienceDirect
Journal of Chromatography A
journal homepage: www.elsevier.com/locate/chroma
No-alignment-strategies for exploring a set of two-way data tables
obtained from capillary electrophoresis–mass spectrometry
M. Daszykowski
a
, R. Danielsson
b
, B. Walczak
a,∗
a
Department of Chemometrics, Institute of Chemistry, Silesian University, 9 Szkolna Street, 40-006 Katowice, Poland
b
Department of Physical and Analytical Chemistry, Analytical Chemistry, Uppsala University, P.O. Box 599, SE-751 24 Uppsala, Sweden
article info
Article history:
Received 25 December 2007
Received in revised form 8 March 2008
Accepted 11 March 2008
Available online 15 March 2008
Keywords:
Comparing data tables
Hyphenated techniques
Two-dimensional fingerprints
Alignment
Warping
Chemometrics
abstract
Hyphenated techniques such as capillary electrophoresis–mass spectrometry (CE–MS) or high-
performance liquid chromatography with diode array detection (HPLC–DAD), etc., are known to produce
a huge amount of data since each sample is characterized by a two-way data table. In this paper different
ways of obtaining sample-related information from a set of such tables are discussed. Working with orig-
inal data requires alignment techniques due to time shifts caused by unavoidable variations in separation
conditions. Other pre-processing techniques have been suggested to facilitate comparison among samples
without prior peak alignment, for example, ‘binning’ and/or ‘blurring’ the data along the time dimension.
All these techniques, however, require optimization of some parameters, and in this paper an alternative
parameter-free method is proposed. The individual data tables (X) are represented as Gram matrices (XX
T
),
where the summation is taken over the time dimension. Hence the possible variations in time scale are
eliminated, while the time information is at least partly preserved by the correlation structure between
the detection channels. For comparison among samples, a similarity matrix is constructed and explored
by principal component analysis and hierarchical clustering. The Gram matrix approach was tested and
compared to some other methods using ‘binned’ and ‘blurred’ data for a data set with CE–MS runs on urine
samples. In addition to data exploration by principal component analysis and hierarchical clustering, a
discriminant partial least squares model was constructed to discriminate between the samples that were
taken with and without the prior intake of a drug. The result showed that the proposed method is at least
as good as the others with respect to cluster identification and class prediction. A distinct advantage is
that there is no need for parameter optimization, while a potential drawback is the large size of the Gram
matrices for data with high mass resolution.
© 2008 Elsevier B.V. All rights reserved.
1. Introduction
Nowadays, hyphenated chromatographic techniques, such
as high-performance liquid chromatography–mass spectrome-
try (LC–MS) are frequently used in proteomic and metabolomic
studies. They provide as output a data table of each individual chro-
matographic run in which a sample is analyzed. Therefore, the data
collected in the analysis of several samples can be viewed as a three-
way array, e.g. retention time × multiple detector responses × samples.
A major advantage of the hyphenated techniques over the chro-
matographic methods equipped with monochannel detectors is the
possibility to overcome problems with co-elution and to verify the
purity of chromatographic peaks [1]. The hyphenated chromato-
graphic techniques are often the methods of choice in order to
∗
Corresponding author. Tel.: +48 32 3592115; fax: +48 32 2599978.
E-mail address: beata@us.edu.pl (B. Walczak).
obtain fingerprints of complex mixtures like biofluids (e.g. urine
and serum samples), environmental samples, peptides, food sam-
ples, etc., where the goal is to find differences among samples and
to identify components responsible for these differences. More-
over, the obtained data are very complex and their chemometric
exploration is still an ongoing challenge [2].
Usually the collected three-way data are matricized (unfolded),
for example, like samples × (retention time × multiple detector
responses) and further explored with unsupervised chemometric
techniques designed to process two-way data. The applications
of principal component analysis (PCA) [3] and different cluster-
ing techniques [4] seem to be dominant in the literature. With
PCA, data visualization and a study of similarities among samples
are possible by projecting samples onto selected pairs of principal
components that describe the majority of the data variance. Addi-
tionally, PCA helps to analyze relationships among the explanatory
variables and their contributions to individual principal compo-
nents. In addition to PCA, the hierarchical clustering approaches
0021-9673/$ – see front matter © 2008 Elsevier B.V. All rights reserved.
doi:10.1016/j.chroma.2008.03.027