Journal of Chromatography A, 1256 (2012) 150–159
Contents lists available at SciVerse ScienceDirect
Journal of Chromatography A
j our na l ho me p ag e: www.elsevier.com/locate/chroma
New supervised alignment method as a preprocessing tool for chromatographic
data in metabolomic studies
Wiktoria Struck, Paweł Wiczling, Małgorzata Waszczuk-Jankowska, Roman Kaliszan,
Michał Jan Markuszewski
∗
Department of Biopharmaceutics and Pharmacodynamics, Medical University of Gda´ nsk, Al. Gen. Hallera 107, 80-416 Gda´ nsk, Poland
a r t i c l e i n f o
Article history:
Received 19 March 2012
Received in revised form 19 June 2012
Accepted 26 July 2012
Available online 2 August 2012
Keywords:
Metabolomics
HPLC
Retention time shift
Data alignment
Warping
Preprocessing methods
a b s t r a c t
The purpose of this work was to develop a new aligning algorithm called supervised alignment and to
compare its performance with the correlation optimized warping. The supervised alignment is based on
a “supervised” selection of a few common peaks presented on each chromatogram. The selected peaks
are aligned based on a difference in the retention time of the selected analytes in the sample and the
reference chromatogram. The retention times of the fragments between known peaks are subsequently
linearly interpolated. The performance of the proposed algorithm has been tested on a series of simulated
and experimental chromatograms. The simulated chromatograms comprised analytes with a systematic
or random retention time shifts. The experimental chromatographic (RP-HPLC) data have been obtained
during the analysis of nucleosides from 208 urine samples and consists of both the systematic and ran-
dom displacements. All the data sets have been aligned using the correlation optimized warping and
the supervised alignment. The time required to complete the alignment, the overall complexity of both
algorithms, and its performance measured by the average correlation coefficients are compared to assess
performance of tested methods. In the case of systematic shifts, both methods lead to the successful
alignment. However, for random shifts, the correlation optimized warping in comparison to the super-
vised alignment requires more time (few hours versus few minutes) and the quality of the alignment
described as correlation coefficient of the newly aligned matrix is worse 0.8593 versus 0.9629. For the
experimental dataset supervised alignment successfully aligns 208 samples using 10 prior identified
peaks. The knowledge about retention times of few analytes’ in the data sets is necessary to perform the
supervised alignment for both systematic and random shifts. The supervised alignment method is faster,
more effective and simpler preprocessing method than the correlation optimized warping method and
can be applied to the chromatographic and electrophoretic data sets.
© 2012 Elsevier B.V. All rights reserved.
1. Introduction
Now, in the era of evolving bioinformatics methods, there is
a clear trend toward the analysis of the entire chromatographic
data matrix, rather than the selected peaks detected in the chro-
matograms. This broad approach does not require choosing the
individual analytes for their integration and subsequent analysis,
therefore, does not cause data loss. Such a holistic approach over-
comes the problem with enormity of the data in the areas like
metabolomics, which refers, inter alia, to the analysis of metabolic
profiles, metabolic fingerprinting, as well as examines the inter-
actions between levels of not necessarily identified metabolites.
By analyzing the entire chromatographic data matrix instead of
concentrations or peak areas of selected analytes, more relevant
∗
Corresponding author. Tel.: +48 58 349 3260; fax: +48 58 349 3262.
E-mail address: markusz@gumed.edu.pl (M.J. Markuszewski).
information can be extracted about the analyzed sample using
appropriate classification and prediction methods. However prior
to such chemometric analyses, it is necessary to align retention
time shifts that occur either globally or in small sections of the
chromatograms. The peaks are shifted because of the unavoid-
able changes of the experimental conditions caused by the minor
changes in the mobile phase composition, stationary phase prop-
erties or by the impact of sample matrix (particularly in case of
biological sample matrix such as urine or serum). Two types of
peak shifts can be distinguished in a real set of chromatograms.
In the first one, called systematic, the difference between reten-
tion times of the corresponding analytes on the two consecutive
chromatograms versus the retention time is a continuous func-
tion. It is a very common situation and might be a consequence
of column ageing, changes in chromatographic conditions, minor
changes in the mobile phase composition, etc. Contrary, for the
random displacement the difference between retention times of
corresponding analytes is a random variable, so it affects each peak
0021-9673/$ – see front matter © 2012 Elsevier B.V. All rights reserved.
http://dx.doi.org/10.1016/j.chroma.2012.07.084