Integration of Molecular Networking and In-Silico MS/MS
Fragmentation for Natural Products Dereplication
Pierre-Marie Allard,
†
Tiphaine Pe ́ resse,
‡
Jonathan Bisson,
§
Katia Gindro,
∥
Laurence Marcourt,
†
Van Cuong Pham,
⊥
Fanny Roussi,
‡
Marc Litaudon,
‡
and Jean-Luc Wolfender*
,†
†
School of Pharmaceutical Sciences, EPGL, University of Geneva, University of Lausanne, Quai Ernest-Ansermet 30, CH-1211
Geneva 4, Switzerland
‡
Institut de Chimie des Substances Naturelles CNRS UPR 2301, University Paris-Saclay, 1 Avenue de la Terrasse, 91198
Gif-sur-Yvette, France
§
Center for Natural Product Technologies, Department of Medicinal Chemistry and Pharmacognosy College of Pharmacy, University
of Illinois at Chicago, 833 South Wood Street, Chicago, Illinois 60612, United States
∥
Mycology and Biotechnology group, Institute for Plant Production Sciences IPS, Agroscope, Route de Duillier 50, P.O. Box 1012,
1260 Nyon, Switzerland
⊥
Institute of Marine Biochemistry of the Vietnam Academy of Science and Technology (VAST), 18 Hoang Quoc Viet road, Cau Giay
Hanoi, Vietnam
* S Supporting Information
ABSTRACT: Dereplication represents a key step for rapidly identifying known secondary
metabolites in complex biological matrices. In this context, liquid-chromatography coupled
to high resolution mass spectrometry (LC-HRMS) is increasingly used and, via untargeted
data-dependent MS/MS experiments, massive amounts of detailed information on the
chemical composition of crude extracts can be generated. An efficient exploitation of such
data sets requires automated data treatment and access to dedicated fragmentation
databases. Various novel bioinformatics approaches such as molecular networking (MN)
and in-silico fragmentation tools have emerged recently and provide new perspective for
early metabolite identification in natural products (NPs) research. Here we propose an
innovative dereplication strategy based on the combination of MN with an extensive in-
silico MS/MS fragmentation database of NPs. Using two case studies, we demonstrate that
this combined approach offers a powerful tool to navigate through the chemistry of
complex NPs extracts, dereplicate metabolites, and annotate analogues of database entries.
I
n natural products (NPs) research, crude extracts of various
origin (e.g., plants, marine organisms, and microorganisms)
containing thousands of metabolites have to be characterized,
either as part of bioactivity guided isolation studies for drug
discovery purposes or in the frame of metabolomics
investigation for biomarker identification. Isolation and de
novo structural elucidation of NPs is a tedious task and should
ideally only be performed for new metabolites to avoid the
costly reisolation process of known molecules.
1
Unambiguous
metabolite identification thus represents one of the major
bottlenecks in metabolomics studies and in NPs chemistry.
2
The rapid identification of known metabolites by comparison of
experimental spectral data to databases is referred to as
dereplication. This dereplication process is now mandatory to
efficiently guide the isolation of only valuable NPs or
biomarkers within their complex biological matrices.
3
Notable
improvements in metabolite profiling methods have been
mainly related to the introduction of ultrahigh performance
liquid chromatography (UHPLC) with sub-2 μm particles
columns and to the development of benchtop high-resolution
mass spectrometry (HRMS) detectors. Detailed information on
the chemical composition of crude natural extracts can now be
efficiently obtained.
4
High-resolution MS data, when used in
combination with orthogonal heuristic filters, such as isotopic
pattern distribution, is able to lead to the correct molecular
formula of the analytes in many cases.
5,6
Nevertheless, even
with the correct molecular formula, isomers can not be resolved
and additional spectral information are then needed in order to
discriminate between the potential candidates. Tandem MS/
MS offers structural insights by breaking the analyzed ion into
fragment ions and measuring their m/z ratio. Tandem MS/MS
data is thus more discriminant in a dereplication process than
the parent mass alone.
7
However, the manual inspection of
individual MS/MS spectra is a tedious task and the complexity
and amount of data generated by LC−MS/MS analysis of
complex extracts makes automated methods preferable.
Recently, various bioinformatics approaches have been
developed to organize or interpret large sets of MS/MS
Received: December 18, 2015
Accepted: February 16, 2016
Article
pubs.acs.org/ac
© XXXX American Chemical Society A DOI: 10.1021/acs.analchem.5b04804
Anal. Chem. XXXX, XXX, XXX−XXX