Protannotator: A Semiautomated Pipeline for Chromosome-Wise Functional Annotation of the MissingHuman Proteome Mohammad T. Islam, ,,# Gagan Garg, ,,# William S. Hancock, § Brian A. Risk, Mark S. Baker, and Shoba Ranganathan* ,,, Department of Chemistry and Biomolecular Sciences and ARC Centre of Excellence in Bioinformatics, Macquarie University, Sydney, NSW 2109, Australia § Barnett Institute, Northeastern University, 140 The Fenway, Boston, Massachusetts 02115, United States College of Arts and Sciences, Boise State University, 1910 University Drive, Boise, Idaho 83725, United States Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, 14 Medical Drive, 117599 Singapore * S Supporting Information ABSTRACT: The chromosome-centric human proteome project (C- HPP) aims to dene the complete set of proteins encoded in each human chromosome. The neXtProt database (September 2013) lists 20 128 proteins for the human proteome, of which 3831 human proteins (19%) are considered missingaccording to the standard metrics table (released September 27, 2013). In support of the C-HPP initiative, we have extended the annotation strategy developed for human chromosome 7 missingproteins into a semiautomated pipeline to functionally annotate the missinghuman proteome. This pipeline integrates a suite of bioinformatics analysis and annotation software tools to identify homologues and map putative functional signatures, gene ontology, and biochemical pathways. From sequential BLAST searches, we have primarily identied homologues from reviewed nonhuman mammalian proteins with protein evidence for 1271 (33.2%) missingproteins, followed by 703 (18.4%) homologues from reviewed nonhuman mammalian proteins and subsequently 564 (14.7%) homologues from reviewed human proteins. Functional annotations for 1945 (50.8%) missingproteins were also determined. To accelerate the identication of missingproteins from proteomics studies, we generated proteotypic peptides in silico. Matching these proteotypic peptides to ENCODE proteogenomic data resulted in proteomic evidence for 107 (2.8%) of the 3831 missing proteins, while evidence from a recent membrane proteomic study supported the existence for another 15 missingproteins. The chromosome-wise functional annotation of all missingproteins is freely available to the scientic community through our web server (http://biolinfo.org/protannotator). KEYWORDS: Human Proteome Project, human chromosome, missing proteins, sequential BLAST, functional annotation, proteotypic peptides, proteogenomics INTRODUCTION The interpretation of the human genome depends on detailed annotation, usually at the nucleotide level, the protein level, and the process level, 1 for which the functional annotation of proteins is crucial at the process level. Since 2008, the Human Proteome Organization (HUPO) has pursued the comprehen- sive identication and functional characterization of the human proteome via the Human Proteome Project (HPP), 2 of which the chromosome-centric HPP (C-HPP) approach seeks to catalog the human proteome on the basis of chromosomes. 35 The International Chromosome-centric Human Proteome Project (C-HPP), launched in 2012, marks the rst step toward the genome-wide chromosome by chromosome characterization of the human proteome. 6 Such an approach would address a key aim of the human genome project, viz. personalized medicine, by providing sensitive and highly specic protein biomarkers for early onset diagnosis, prognosis and treatment of several diseases, providing clinical and translational proteomic solutions. 7 The three pillars of HPP are mass spectrometric proteomics, antibody/anity capturing agents, and a knowledgebase, 2 embodied by the neXtProt database, 8 where detailed information on the human proteome is collated, curated, and organized for rapid access of information on a query protein. Our group carried out the functional annotation of missing Special Issue: Chromosome-centric Human Proteome Project Received: August 1, 2013 Published: December 6, 2013 Article pubs.acs.org/jpr © 2013 American Chemical Society 76 dx.doi.org/10.1021/pr400794x | J. Proteome Res. 2014, 13, 7683