REVIEW The correspondence problem for metabonomics datasets K. Magnus Åberg & Erik Alm & Ralf J. O. Torgrip Received: 31 October 2008 / Accepted: 15 January 2009 / Published online: 7 February 2009 # Springer-Verlag 2009 Abstract In metabonomics it is difficult to tell which peak is which in datasets with many samples. This is known as the correspondence problem. Data from different samples are not synchronised, i.e., the peak from one metabolite does not appear in exactly the same place in all samples. For datasets with many samples, this problem is nontrivial, because each sample contains hundreds to thousands of peaks that shift and are identified ambiguously. Statistical analysis of the data assumes that peaks from one metabolite are found in one column of a data table. For every error in the data table, the statistical analysis loses power and the risk of missing a biomarker increases. It is therefore important to solve the correspondence problem by synchro- nising samples and there is no method that solves it once and for all. In this review, we analyse the correspondence problem, discuss current state-of-the-art methods for syn- chronising samples, and predict the properties of future methods. Keywords Alignment . Warping . Chromatography . Metabolic profiling . NMR . Mass spectrometry (MS) Introduction This critical review focuses on the correspondence problem and its properties for metabonomics datasets. Starting from the properties of NMR and chromatography–mass spec- trometry data, a selection of current state-of-the-art syn- chronisation methods are discussed. This review is intended as a guide to the problem and to the current attempts at solving it. Recent reviews dealing with this problem are Listgarten and Emili [1] and Vandenbogaert et al. [2]. The review of Listgarten has a wider scope—statistical methods for comparative proteomic profiling. Vandenbogaert reviews alignment of LC–MS images with focus on proteomics and detection of biomarkers. In this review we give a more in- depth description of the correspondence problem with focus on metabonomics data from NMR and LC–MS. What is correspondence? The correspondence problem is about arranging things in their proper place, i.e. putting the right values in the right rows and columns of a data table. An illustrative example is shown in Fig. 1. Suppose you want to compare suppliers of fruit baskets to your office and you have a preference for green apples. You would like to get the most fruit for your money but there must not be too few green apples. The fruit is sorted according to category and weighted. The weight data are summarized in a data table on which you are going to base your decision on which supplier to use. The table in Fig. 1 cannot be used for reliable decision-making because Anal Bioanal Chem (2009) 394:151–162 DOI 10.1007/s00216-009-2628-9 K. M. Åberg (*) : E. Alm : R. J. O. Torgrip Department of Analytical Chemistry, BioSysteMetrics Group, Stockholm University, 10691 Stockholm, Sweden e-mail: magnus.aberg@anchem.su.se Magnus Åberg holds a Ph.D. in chemometrics and has been a researcher in the BioSysteMetrics Group at the Department of Analytical Chemistry at Stockholm University since 2006. His current research interests are developing algorithms and methods for maximizing information recovery from data, e.g. from metabonomics