Unipept: Tryptic Peptide-Based Biodiversity Analysis of
Metaproteome Samples
Bart Mesuere,*
,†
Bart Devreese,
‡
Griet Debyser,
‡
Maarten Aerts,
§
Peter Vandamme,
§
and Peter Dawyndt
†
†
Department of Applied Mathematics and Computer Science,
‡
Laboratory for Protein Biochemistry and Biomolecular Engineering,
and
§
Laboratory for Microbiology, Faculty of Sciences, Ghent University, Krijgslaan 281, B-9000 Ghent, Belgium
* S Supporting Information
ABSTRACT: The Unipept web application (http://unipept.ugent.be) supports biodiver-
sity analysis of large and complex metaproteome samples using tryptic peptide information
obtained from shotgun MS/MS experiments. Its underlying index structure is designed to
quickly retrieve all occurrences of a tryptic peptide in UniProtKB records. Taxon-specificity
of the tryptic peptide is successively derived from these occurrences using a novel lowest
common ancestor approach that is robust against taxonomic misarrangements,
misidentifications, and inaccuracies. Not taking into account this identification noise
would otherwise result in drastic loss of information. Dynamic treemaps visualize the
biodiversity of metaproteome samples, which eases the exploration of samples with highly
complex compositions. The potential of Unipept to gain novel insights into the biodiversity
of a sample is evaluated by reanalyzing publicly available metaproteome data sets taken
from the bacterial phyllosphere and the human gut.
KEYWORDS: metaproteomics, tryptic peptides, biodiversity analysis, treemap visualization
■
INTRODUCTION
The introduction of high-throughput sequencing methods
made it possible to determine the diversity, phylogeny, and
genomic repertoire of complex microbial communities such as
the human gut microbiome. Recently, the Metahit consortium
released metagenomic sequence information showing approx-
imately 1,000 different species commonly found in fecal
samples, on average accounting for half a million genes in
addition to the human genome.
1
While metagenomics provides
a wealth of information on the global gene content,
understanding the actual functional contribution to nutrient
conversion or immune system development of individual genes
or organisms requires functional genomics tools.
High quality multidimensional liquid chromatography in
combination with shotgun tandem mass spectrometric methods
are currently implemented to reveal the protein complement of
the metagenome, providing information of the core functional
components.
2,3
In a typical single species proteomics experi-
ment, protein identification from shotgun MS/MS data of
tryptic peptides relies on matching the spectra to in silico
calculated spectral information from proteins predicted from
isolate or metagenomic databases. Therefore, tryptic digests
and fragmentation ions on all protein sequences available for
this organism are simulated. In the worst case, protein
identification can be based on cross-species identification,
typically using a close homologue.
In metaproteomics approaches, MS/MS-based identification
is hampered by several aspects. A first problem is the limited
coverage of the curated protein databases, e.g., UniProtKB/
Swiss-Prot.
4
Ideally, a protein complement of a synthetic
metagenomic database containing sequences from different
metagenomics experiments covering a wide range of organisms
expected in the environment of interest could be created.
Metagenomic databases however are exponentially increasing,
and naive six-frame translation and protein prediction would
lead to a high false discovery rate or low protein identification
efficiency. Rooijers et al.
5
countered this problem by
implementing an iterative workflow combining the use of a
defined synthetic metagenome and a non-annotated meta-
genome repository.
A more specific problem toward functional analysis of the
metaproteome is the lack of connectivity of the tryptic peptides
and the organism of origin. Many tryptic peptide sequences are
conserved over different bacterial taxa and are therefore not
informative to describe the taxonomic diversity or functional
properties of the sample. Askenazi et al.
6
developed the
Pep2Pro web service to identify taxon-specific peptides.
However, they used a restricted definition of taxon-specificity
by retaining only peptides unique to a single taxon as defined in
the NCBI taxonomy. In this paper, we present Unipept
(http://unipept.ugent.be), a web application that supports
biodiversity analysis of large and complex metaproteome
samples using tryptic peptide information obtained from
shotgun MS/MS experiments. Its underlying index structure
is designed to quickly retrieve all occurrences of a tryptic
peptide in UniProtKB records. Taxon-specificity of the tryptic
peptide is successively derived from these occurrences using a
Received: June 27, 2012
Published: November 15, 2012
Article
pubs.acs.org/jpr
© 2012 American Chemical Society 5773 dx.doi.org/10.1021/pr300576s | J. Proteome Res. 2012, 11, 5773-5780