An Adaptive Pipeline To Maximize Isobaric Tagging Data in Large- Scale MS-Based Proteomics John Corthe ́ sy, , Konstantinos Theolatos, , Seferina Mavroudi, ,§ Charlotte Macron, Ornella Cominetti, Mona Remlawi, Francesco Ferraro, Antonio Nú ñ ez Galindo, Martin Kussmann, ,# Spiridon Likothanassis,* ,, and Loïc Dayon* , Nestle ́ Institute of Health Sciences, Lausanne 1015, Switzerland InSybio, Ltd., Innovations House, 19 Staple Gardens, Winchester SO238SR, United Kingdom § Department of Social Work, School of Sciences of Health and Care, Technological Educational Institute of Western Greece, Patras 26334, Greece Department of Computer Engineering and Informatics, University of Patras, Patras 26500, Greece * S Supporting Information ABSTRACT: Isobaric tagging is the method of choice in mass-spectrometry-based proteomics for comparing several conditions at a time. Despite its multiplexing capabilities, some drawbacks appear when multiple experiments are merged for comparison in large sample-size studies due to the presence of missing values, which result from the stochastic nature of the data-dependent acquisition mode. Another indirect cause of data incompleteness might derive from the proteomic-typical data- processing workow that rst identies proteins in individual experiments and then only quanties those identied proteins, leaving a large number of unmatched spectra with quantitative information unexploited. Inspired by untargeted metabolomic and label-free proteomic workows, we developed a quantication-driven bioinformatic pipeline (Quantify then Identify (QtI)) that optimizes the processing of isobaric tandem mass tag (TMT) data from large-scale studies. This pipeline includes innovative features, such as peak ltering with a self-adaptive preprocessing pipeline optimization method, Peptide Match Rescue, and Optimized Post-Translational Modication. QtI outperforms a classical benchmark workow in terms of quantication and identication rates, signicantly reducing missing data while preserving unmatched features for quantitative comparison. The number of unexploited tandem mass spectra was reduced by 77 and 62% for two human cerebrospinal uid and plasma data sets, respectively. KEYWORDS: algorithms, bioinformatics, biomarkers, discovery, isobaric tagging, machine learning, protein identication, quantication, tandem mass spectrometry, tandem mass tag INTRODUCTION Mass-spectrometry (MS)-based shotgun proteomics can generate large data sets, composed of millions of tandem mass spectra, that are usually rst matched by comparison with theoretical fragmentation spectra of dened proteolytic peptide sequences. 1 In many workows, protein identication based on spectral matching is followed by quantication of the identied proteins under dierent biological conditions. 2 However, this broadly used data processing pipeline, the so-called classical workowin the following, can be revealed to be rather inecient considering the large amount of unexploited spectral information. For instance, only a fraction of all acquired spectra matched in independent experiments is further used to provide complete quantitative information. The prevalence of the protein identication step can be predominantly attributed to historical reasons because of the initial role of MS to identify proteins after gel electrophoresis. 3 Protein identication may have inappropriately remained the rst processing task performed, detrimental to the quantication in many work- ows, especially those employing isobaric labeling. The yield of spectral matching in MS-based proteomics is rather limited, and several reports have indicated typical success rates between 20 and 50%, 4,5 due to, for instance, sequence isoforms and variants not present in databases, the presence of unexpected and multiple modications, or low tandem spectral information (e.g., low spectral representation or low peak intensity depending on the nature of the peptides but also the timing of their fragmentation during the chromatographic elution). These gures also apply when isobaric tagging technologies (e.g., isobaric tags for relative and absolute quantitation (iTRAQ) 6 and tandem mass tag (TMT) 7 ) are used for relative protein quantication. Importantly, many unmatched tandem mass spectra from such labeling experi- ments harbor information on true peptides possibly assignable Received: February 15, 2018 Published: April 26, 2018 Article pubs.acs.org/jpr Cite This: J. Proteome Res. 2018, 17, 2165-2173 © 2018 American Chemical Society 2165 DOI: 10.1021/acs.jproteome.8b00110 J. Proteome Res. 2018, 17, 21652173 Downloaded via NESTEC SA on September 6, 2019 at 13:22:20 (UTC). See https://pubs.acs.org/sharingguidelines for options on how to legitimately share published articles.