ComplexQuant: High-throughput computational pipeline for the global quantitative analysis of endogenous soluble protein complexes using high resolution protein HPLC and precision label-free LC/MS/MS Cuihong Wan a, b , Jian Liu a, b , Vincent Fong a, b , Andrew Lugowski b , Snejana Stoilova b , Dylan Bethune-Waddell b , Blake Borgeson c , Pierre C. Havugimana a, b , Edward M. Marcotte c , Andrew Emili a, b, a Banting and Best Department of Medical Research, University of Toronto, 160 College St., Toronto, Ontario, Canada M5S 3E1 b Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College St., Toronto, Ontario, Canada M5S 3E1 c Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, Department of Chemistry and Biochemistry, University of Texas at Austin, Austin, TX, USA ARTICLE INFO ABSTRACT The experimental isolation and characterization of stable multi-protein complexes are essential to understanding the molecular systems biology of a cell. To this end, we have developed a high-throughput proteomic platform for the systematic identification of native protein complexes based on extensive fractionation of soluble protein extracts by multi-bed ion exchange high performance liquid chromatography (IEX-HPLC) combined with exhaustive label-free LC/MS/MS shotgun profiling. To support these studies, we have built a companion data analysis software pipeline, termed ComplexQuant. Proteins present in the hundreds of fractions typically collected per experiment are first identified by exhaustively interrogating MS/MS spectra using multiple database search engines within an integrative probabilistic framework, while accounting for possible post-translation modifications. Protein abundance is then measured across the fractions based on normalized total spectral counts and precursor ion intensities using a dedicated tool, PepQuant. This analysis allows co-complex membership to be inferred based on the similarity of extracted protein co-elution profiles. Each computational step has been optimized for processing large-scale biochemical fractionation datasets, and the reliability of the integrated pipeline has been benchmarked extensively. This article is part of a Special Issue entitled: Proteomics from protein structures to clinical applications (CNPN 2012). © 2012 Published by Elsevier B.V. Keywords: Protein complex Biochemical fractionation HPLC co-elution LC/MS/MS Label-free quantification Proteinprotein interaction 1. Introduction Physical interactions between proteins are involved in many biochemical processes critical to cell viability, growth and proliferation. Most proteins function as components of larger molecular assemblies formed via the association of specific binding partners. Therefore, the detection and analysis of multi- protein complexes is essential for understanding the basic JOURNAL OF PROTEOMICS XX (2012) XXX XXX This article is part of a Special Issue entitled: Proteomics from protein structures to clinical applications (CNPN 2012). Corresponding author at: Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College St., Toronto, Ontario, Canada M5S 3E1. Tel.: +1 416 946 7281; fax: +1 416 978 7437. E-mail address: andrew.emili@utoronto.ca (A. Emili). 1874-3919/$ see front matter © 2012 Published by Elsevier B.V. http://dx.doi.org/10.1016/j.jprot.2012.10.001 Available online at www.sciencedirect.com www.elsevier.com/locate/jprot JPROT-01190; No of Pages 10 Please cite this article as: Wan C, et al, ComplexQuant: High-throughput computational pipeline for the global quantitative analysis of endogenous soluble protein complexes ..., J Prot (2012), http://dx.doi.org/10.1016/j.jprot.2012.10.001