Analytica Chimica Acta 733 (2012) 16–22
Contents lists available at SciVerse ScienceDirect
Analytica Chimica Acta
jou rn al hom epa ge: www.elsevier.com/locate/aca
Systematic ratio normalization of gas chromatography signals for biological
sample discrimination and biomarker discovery
Benoist Lehallier
a
, Jérémy Ratel
a
, Mohamed Hanafi
b
, Erwan Engel
a,∗
a
INRA, UR 370 QuaPA, MASS laboratory, 63122 Saint-Genès-Champanelle, France
b
ONIRIS, Sensometrics & Chemometrics Laboratory, site de la Géraudière, 44322 Nantes, France
a r t i c l e i n f o
Article history:
Received 2 September 2011
Received in revised form 3 April 2012
Accepted 10 April 2012
Available online 25 April 2012
Keywords:
Systematic ratio normalization
Discrimination
Biomarker
Gas chromatography–mass spectrometry
Volatile compounds
a b s t r a c t
The present paper introduces a new gas chromatography data processing procedure dubbed systematic
ratio normalization (SRN) enabling to improve both sample set discrimination and biomarker iden-
tification. SRN consists in (1) calculating, for each sample, all the log-ratios between abundances of
chromatography-analyzed compounds, then (2) selecting the log-ratio(s) that best maximize the dis-
crimination between sample-sets. The relevance of SRN was evaluated on two data sets acquired through
gas chromatography–mass spectrometry as part of separate studies designed (i) to discriminate source-
origins between vegetable oils analyzed via an analytical system exposed to instrument drift (data set
1) and (ii) to discriminate animal feed between meat samples aged for different durations (data set 2).
Applying SRN to raw data made it possible to obtain robust discrimination models for the two data sets by
enhancing the contribution to the data variance of the factor-of-interest while stabilizing the contribu-
tion of the disturbance factor. The most discriminant log-ratios were shown to employ the most relevant
biomarkers presenting relative independence of the factor-of-interest as well as co-behavior of the dis-
turbance effects potentially biasing the discrimination, such as instrument drift or sample biochemical
changes. SRN can be run a posteriori on any data set, and might be generalizable to most of separating
methods.
© 2012 Elsevier B.V. All rights reserved.
1. Introduction
After the extensive work that is being done in the areas of
metabolomics and proteomics [1,2], the discrimination of biolog-
ical samples based on mass spectrometry has come of interest
in its own rights [3]. Differentiating sample sets according to a
factor-of-interest hinges on highlighting distinctive components
that may only be present in trace amounts, while minimizing the
incidence of other factors liable to even partially mask the discrim-
inant function. Gas chromatography coupled mass spectrometry
(GC–MS) is well-geared to handling the discrimination of complex
matrices such as processed foods or biological samples, both in
terms of technical accuracy and quantification of small-molecular-
weight compounds [4–6]. Together with peak alignment, mass
spectra deconvolution and compound identification [7], signal
normalization represents one key bottleneck to a comprehen-
sive discovery of distinctive biomarkers. Despite the increasingly
powerful performance of commercially available instruments,
extracting useful information from analytical signals still requires
∗
Corresponding author. Tel.: +33 04 73 62 45 89; fax: +33 04 73 62 47 31.
E-mail address: erwan.engel@clermont.inra.fr (E. Engel).
chemometric normalization tools in order to minimize the inci-
dence of disturbance factors, some of which are tied to the
technique employed while others are inherent to the sample itself
[6,8].
Normalization is generally defined as a processing procedure
designed to suppress systematic variance that is unrelated to the
relevant signal [9,10]. Among the data normalization methods
available, variable ratios emerged to become widely adopted in
the twentieth century [11]. A number of authors have proposed
to normalize each compound in the GC–MS signal via one or more
compounds found naturally in every chromatogram and present-
ing a variance that is independent of the factor studied [12]. This
method, called diagnostic ratios, does however require a priori
selection of reference compounds and affects co-variance between
normalized variables [9]. Internal signal normalization by the sum
of the signal components is commonly performed in data analy-
sis to overcome the effects of variations in sensitivity and injected
quantities on the intensity of recorded signals [13,14]. However,
this procedure can prove insufficient, as the individual normalized
variables have a high covariance due to the mode of normaliza-
tion expression (relative to the percentage of the sum total) which
generates statistical cross-links [15]. Moreover, this procedure may
distort the data severely if the assumption of a lack of overall
0003-2670/$ – see front matter © 2012 Elsevier B.V. All rights reserved.
http://dx.doi.org/10.1016/j.aca.2012.04.019