Protannotator: A Semiautomated Pipeline for Chromosome-Wise
Functional Annotation of the “Missing” Human Proteome
Mohammad T. Islam,
†,‡,#
Gagan Garg,
†,‡,#
William S. Hancock,
§
Brian A. Risk,
∥
Mark S. Baker,
†
and Shoba Ranganathan*
,†,‡,⊥
†
Department of Chemistry and Biomolecular Sciences and
‡
ARC Centre of Excellence in Bioinformatics, Macquarie University,
Sydney, NSW 2109, Australia
§
Barnett Institute, Northeastern University, 140 The Fenway, Boston, Massachusetts 02115, United States
∥
College of Arts and Sciences, Boise State University, 1910 University Drive, Boise, Idaho 83725, United States
⊥
Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, 14 Medical Drive, 117599
Singapore
* S Supporting Information
ABSTRACT: The chromosome-centric human proteome project (C-
HPP) aims to define the complete set of proteins encoded in each human
chromosome. The neXtProt database (September 2013) lists 20 128
proteins for the human proteome, of which 3831 human proteins (∼19%)
are considered “missing” according to the standard metrics table (released
September 27, 2013). In support of the C-HPP initiative, we have
extended the annotation strategy developed for human chromosome 7
“missing” proteins into a semiautomated pipeline to functionally annotate
the “missing” human proteome. This pipeline integrates a suite of
bioinformatics analysis and annotation software tools to identify
homologues and map putative functional signatures, gene ontology, and
biochemical pathways. From sequential BLAST searches, we have
primarily identified homologues from reviewed nonhuman mammalian
proteins with protein evidence for 1271 (33.2%) “missing” proteins,
followed by 703 (18.4%) homologues from reviewed nonhuman mammalian proteins and subsequently 564 (14.7%)
homologues from reviewed human proteins. Functional annotations for 1945 (50.8%) “missing” proteins were also determined.
To accelerate the identification of “missing” proteins from proteomics studies, we generated proteotypic peptides in silico.
Matching these proteotypic peptides to ENCODE proteogenomic data resulted in proteomic evidence for 107 (2.8%) of the
3831 “missing proteins, while evidence from a recent membrane proteomic study supported the existence for another 15
“missing” proteins. The chromosome-wise functional annotation of all “missing” proteins is freely available to the scientific
community through our web server (http://biolinfo.org/protannotator).
KEYWORDS: Human Proteome Project, human chromosome, missing proteins, sequential BLAST, functional annotation,
proteotypic peptides, proteogenomics
■
INTRODUCTION
The interpretation of the human genome depends on detailed
annotation, usually at the nucleotide level, the protein level, and
the process level,
1
for which the functional annotation of
proteins is crucial at the process level. Since 2008, the Human
Proteome Organization (HUPO) has pursued the comprehen-
sive identification and functional characterization of the human
proteome via the Human Proteome Project (HPP),
2
of which
the chromosome-centric HPP (C-HPP) approach seeks to
catalog the human proteome on the basis of chromosomes.
3−5
The International Chromosome-centric Human Proteome
Project (C-HPP), launched in 2012, marks the first step
toward the genome-wide chromosome by chromosome
characterization of the human proteome.
6
Such an approach
would address a key aim of the human genome project, viz.
personalized medicine, by providing sensitive and highly
specific protein biomarkers for early onset diagnosis, prognosis
and treatment of several diseases, providing clinical and
translational proteomic solutions.
7
The three pillars of HPP are mass spectrometric proteomics,
antibody/affinity capturing agents, and a knowledgebase,
2
embodied by the neXtProt database,
8
where detailed
information on the human proteome is collated, curated, and
organized for rapid access of information on a query protein.
Our group carried out the functional annotation of missing
Special Issue: Chromosome-centric Human Proteome Project
Received: August 1, 2013
Published: December 6, 2013
Article
pubs.acs.org/jpr
© 2013 American Chemical Society 76 dx.doi.org/10.1021/pr400794x | J. Proteome Res. 2014, 13, 76−83