Integration of Molecular Networking and In-Silico MS/MS Fragmentation for Natural Products Dereplication Pierre-Marie Allard, Tiphaine Pe ́ resse, Jonathan Bisson, § Katia Gindro, Laurence Marcourt, Van Cuong Pham, Fanny Roussi, Marc Litaudon, and Jean-Luc Wolfender* , School of Pharmaceutical Sciences, EPGL, University of Geneva, University of Lausanne, Quai Ernest-Ansermet 30, CH-1211 Geneva 4, Switzerland Institut de Chimie des Substances Naturelles CNRS UPR 2301, University Paris-Saclay, 1 Avenue de la Terrasse, 91198 Gif-sur-Yvette, France § Center for Natural Product Technologies, Department of Medicinal Chemistry and Pharmacognosy College of Pharmacy, University of Illinois at Chicago, 833 South Wood Street, Chicago, Illinois 60612, United States Mycology and Biotechnology group, Institute for Plant Production Sciences IPS, Agroscope, Route de Duillier 50, P.O. Box 1012, 1260 Nyon, Switzerland Institute of Marine Biochemistry of the Vietnam Academy of Science and Technology (VAST), 18 Hoang Quoc Viet road, Cau Giay Hanoi, Vietnam * S Supporting Information ABSTRACT: Dereplication represents a key step for rapidly identifying known secondary metabolites in complex biological matrices. In this context, liquid-chromatography coupled to high resolution mass spectrometry (LC-HRMS) is increasingly used and, via untargeted data-dependent MS/MS experiments, massive amounts of detailed information on the chemical composition of crude extracts can be generated. An ecient exploitation of such data sets requires automated data treatment and access to dedicated fragmentation databases. Various novel bioinformatics approaches such as molecular networking (MN) and in-silico fragmentation tools have emerged recently and provide new perspective for early metabolite identication in natural products (NPs) research. Here we propose an innovative dereplication strategy based on the combination of MN with an extensive in- silico MS/MS fragmentation database of NPs. Using two case studies, we demonstrate that this combined approach oers a powerful tool to navigate through the chemistry of complex NPs extracts, dereplicate metabolites, and annotate analogues of database entries. I n natural products (NPs) research, crude extracts of various origin (e.g., plants, marine organisms, and microorganisms) containing thousands of metabolites have to be characterized, either as part of bioactivity guided isolation studies for drug discovery purposes or in the frame of metabolomics investigation for biomarker identication. Isolation and de novo structural elucidation of NPs is a tedious task and should ideally only be performed for new metabolites to avoid the costly reisolation process of known molecules. 1 Unambiguous metabolite identication thus represents one of the major bottlenecks in metabolomics studies and in NPs chemistry. 2 The rapid identication of known metabolites by comparison of experimental spectral data to databases is referred to as dereplication. This dereplication process is now mandatory to eciently guide the isolation of only valuable NPs or biomarkers within their complex biological matrices. 3 Notable improvements in metabolite proling methods have been mainly related to the introduction of ultrahigh performance liquid chromatography (UHPLC) with sub-2 μm particles columns and to the development of benchtop high-resolution mass spectrometry (HRMS) detectors. Detailed information on the chemical composition of crude natural extracts can now be eciently obtained. 4 High-resolution MS data, when used in combination with orthogonal heuristic lters, such as isotopic pattern distribution, is able to lead to the correct molecular formula of the analytes in many cases. 5,6 Nevertheless, even with the correct molecular formula, isomers can not be resolved and additional spectral information are then needed in order to discriminate between the potential candidates. Tandem MS/ MS oers structural insights by breaking the analyzed ion into fragment ions and measuring their m/z ratio. Tandem MS/MS data is thus more discriminant in a dereplication process than the parent mass alone. 7 However, the manual inspection of individual MS/MS spectra is a tedious task and the complexity and amount of data generated by LCMS/MS analysis of complex extracts makes automated methods preferable. Recently, various bioinformatics approaches have been developed to organize or interpret large sets of MS/MS Received: December 18, 2015 Accepted: February 16, 2016 Article pubs.acs.org/ac © XXXX American Chemical Society A DOI: 10.1021/acs.analchem.5b04804 Anal. Chem. XXXX, XXX, XXXXXX