CAPER 2.0: An Interactive, Configurable, and Extensible Workflow-
Based Platform to Analyze Data Sets from the Chromosome-centric
Human Proteome Project
Dan Wang,
†,‡,§,#
Zhongyang Liu,
†,‡,§,#
Feifei Guo,
†,‡,§,∥,#
Lihong Diao,
†,‡,§
Yang Li,
†,‡,§
Xinlei Zhang,
⊥
Zechi Huang,
⊥
Dong Li,*
,†,‡,§
and Fuchu He*
,†,‡,§,∥
†
State Key Laboratory of Proteomics, Beijing Proteome Research Center, Beijing Institute of Radiation Medicine, 33 Life Science Park
Road, Beijing 100850, China
‡
National Center for Protein Sciences Beijing, 33 Life Science Park Road, Beijing 102206, China
§
National Engineering Research Center for Protein Drugs, 33 Life Science Park Road, Beijing 100850, China
∥
Institute of Basic Medical Sciences Chinese Academy of Medical Sciences, School of Basic Medicine Peking Union Medical College,
5 Dong Dan San Tiao, Beijing 100005, China
⊥
Beijing Genestone Technology, Ltd., F21-103, FengLinLvZhou, Kexueyuan Nanli, Datun Road, Beijing 100085, China
ABSTRACT: The Chromosome-centric Human Proteome
Project (C-HPP) aims to map and annotate the entire
human proteome by the “chromosome-by-chromosome”
strategy. As the C-HPP proceeds, the increasing volume of
proteomic data sets presents a challenge for customized and
reproducible bioinformatics data analyses for mining biological
knowledge. To address this challenge, we updated the previous
static proteome browser CAPER into a higher version, CAPER
2.0 − an interactive, configurable and extensible workflow-
based platform for C-HPP data analyses. In addition to the
previous visualization functions of track-view and heatmap-
view, CAPER 2.0 presents a powerful toolbox for C-HPP data
analyses and also integrates a configurable workflow system
that supports the view, construction, edit, run, and share of
workflows. These features allow users to easily conduct their own C-HPP proteomic data analyses and visualization by CAPER
2.0. We illustrate the usage of CAPER 2.0 with four specific workflows for finding missing proteins, mapping peptides to
chromosomes for genome annotation, integrating peptides with transcription factor binding sites from ENCODE data sets, and
functionally annotating proteins. The updated CAPER is available at http://www.bprc.ac.cn/CAPE.
KEYWORDS: proteomic data analysis platform, user-customized workflow, proteomic data visualization, bioinformatics,
Chromosome-centric Human Proteome Project
■
INTRODUCTION
As an important component of the Human Proteome Project
(HPP) established by the Human Proteome Organization
(HUPO), the Chromosome-centric Human Proteome Project
(C-HPP) was officially launched in Geneva in 2011.
1
The C-
HPP aims to identify the entire human protein set encoded in
each chromosome and to characterize them with abundance,
tissue/subcellular localization, post-translational modification
(PTM), single amino acid variant (SAAV) generated by
nonsynonymous single nucleotide polymorphism (nsSNP),
interactome, and so on.
2,3
To achieve these scientific objects,
the C-HPP consortium takes a “chromosome-by-chromosome”
international cooperation strategy. Now all 24 chromosomes
and mitochondria have been “adopted” by 25 teams from the
world,
1
and the research achievements of the first phase have
been published in the 2013 C-HPP special issue of the Journal
of Proteome Research.
4
In particular, the C-HPP consortium is
strengthening the cooperation with the Encyclopedia of DNA
Elements (ENCODE) Consortium, whose goal is to build a
comprehensive parts list of functional elements in the human
genome.
5
This cooperation between the two projects is
promising to promote the elucidation of how the interacting
genomic elements such as polygenes, SNPs, and transcription
factors control the families of isoforms generated at the protein
level.
6
As the C-HPP proceeds, large amounts of proteomic data
sets have been produced.
4
It is challenging to extract
biologically important information from these large-scale,
Special Issue: Chromosome-centric Human Proteome Project
Received: August 1, 2013
Published: November 22, 2013
Article
pubs.acs.org/jpr
© 2013 American Chemical Society 99 dx.doi.org/10.1021/pr400795c | J. Proteome Res. 2014, 13, 99−106