828 VOLUME 34 NUMBER 8 AUGUST 2016 NATURE BIOTECHNOLOGY PERSPECTIVE The potential of the diverse chemistries present in natural products (NP) for biotechnology and medicine remains untapped because NP databases are not searchable with raw data and the NP community has no way to share data other than in published papers. Although mass spectrometry (MS) techniques are well-suited to high-throughput characterization of NP, there is a pressing need for an infrastructure to enable sharing and curation of data. We present Global Natural Products Social Molecular Networking (GNPS; http://gnps. ucsd.edu), an open-access knowledge base for community-wide organization and sharing of raw, processed or identified tandem mass (MS/MS) spectrometry data. In GNPS, crowdsourced curation of freely available community-wide reference MS libraries will underpin improved annotations. Data-driven social-networking should facilitate identification of spectra and foster collaborations. We also introduce the concept of ‘living data’ through continuous reanalysis of deposited data. NP from marine and terrestrial environments, including their inhabiting microorganisms, plants, animals, and humans, are routinely analyzed using MS. However, a single MS experiment can collect thousands of MS/MS spectra in minutes 1 , and individual projects can acquire millions of spectra. These data sets are too large for manual analysis. Furthermore, comprehensive software and proper computational infra- structure are not readily available and only low-throughput sharing of either raw or annotated spectra is feasible, even among members of the same laboratory. The potentially useful information in MS/MS data sets can thus remain buried in papers, laboratory notebooks, and private databases, hindering retrieval, mining, and sharing of data and knowledge. Although several NP databases—Dictionary of Natural Products 2 , AntiBase 3 , and MarinLit 4 —assist in dereplication (iden- tification of known compounds), these resources are not freely avail- able and do not process MS data. Conversely, MS databases, including MassBank 5 , Metlin 6 , mzCloud 7 , and ReSpect 8 , host MS/MS spectra but limit data analyses to several individual spectra or a limited amount of liquid chromatography (LC)–MS files. Other free online computation resources that leverage the MS/MS spectra of Metlin, such as those provided by mzCloud and XCMS Online, are available. However, neither of those allows free download of its reference library. Global genomics and proteomics research has been facilitated by the development of integral resources, such as the US National Center for Biotechnology Information (NCBI; Bethesda, MD, USA) and UniProt KnowledgeBase (UniProtKB), which provide robust platforms for data sharing and knowledge dissemination 9,10 . Recognizing the need for an analogous community platform to ana- lyze NP MS data, we present GNPS. GNPS is a data-driven platform for the storage, analysis, and knowledge dissemination of MS/MS spectra that enables community sharing of raw spectra, continuous annotation of deposited data, and collaborative curation of refer- ence spectra (referred to as spectral libraries) and experimental data (organized as data sets). GNPS provides the ability to analyze a data set and to compare it to all publicly available data. By building on the computational infra- structure of the University of California San Diego (UCSD) Center for Computational Mass Spectrometry (CCMS; http://proteomics. ucsd.edu/), GNPS provides public data set deposition and/or retrieval through the Mass Spectrometry Interactive Virtual Environment (MassIVE) data repository. The GNPS analysis infrastructure further enables online dereplication 6,11–13 , automated molecular network- ing analysis 14–21 , and crowdsourced MS/MS spectrum curation. Each data set added to the GNPS repository is automatically reanalyzed in the next monthly cycle of continuous identification (see ‘Living data by continuous analysis’ below). Each of these tens of millions of spectra in GNPS data sets is matched to reference spectral libraries to annotate molecules and to discover putative analogs (Fig. 1a). From January 2014 to November 2015, GNPS grew to serve 9,267 users from 100 countries (Fig. 1b), with 42,486 analysis sessions that have processed >93 million spectra as molecular networks from a quarter- million LC–MS runs. Searches against a combined catalog of over 221,000 MS/MS reference library spectra from 18,163 compounds (Supplementary Table 1) are possible, and GNPS has matched almost one hundred million MS/MS spectra in all public and private search jobs using an estimated 84,000 compute hours. GNPS spectral libraries GNPS spectral libraries enable dereplication, variable dereplication (approximate matches to spectra of related molecules), and identifica- tion of spectra in molecular networks. GNPS has collected available MS/MS spectral libraries relevant to NP (which also include other metabolites and molecules), including MassBank 5 , ReSpect 8 , and NIST 22 (Table 1, Fig. 2a and Supplementary Table 1). Altogether, these third-party libraries total 212,230 MS/MS spectra representing 12,694 unique compounds (Fig. 2b). Although this combined collection of reference spectra provides a starting point for dereplication, only Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking A full list of authors and affiliations appears at the end of the paper. Received 11 August 2015; accepted 10 May 2016; published online 9 August 2016; doi:10.1038/nbt.3597 npg © 2016 Nature America, Inc. All rights reserved.