Unipept: Tryptic Peptide-Based Biodiversity Analysis of Metaproteome Samples Bart Mesuere,* , Bart Devreese, Griet Debyser, Maarten Aerts, § Peter Vandamme, § and Peter Dawyndt Department of Applied Mathematics and Computer Science, Laboratory for Protein Biochemistry and Biomolecular Engineering, and § Laboratory for Microbiology, Faculty of Sciences, Ghent University, Krijgslaan 281, B-9000 Ghent, Belgium * S Supporting Information ABSTRACT: The Unipept web application (http://unipept.ugent.be) supports biodiver- sity analysis of large and complex metaproteome samples using tryptic peptide information obtained from shotgun MS/MS experiments. Its underlying index structure is designed to quickly retrieve all occurrences of a tryptic peptide in UniProtKB records. Taxon-specicity of the tryptic peptide is successively derived from these occurrences using a novel lowest common ancestor approach that is robust against taxonomic misarrangements, misidentications, and inaccuracies. Not taking into account this identication noise would otherwise result in drastic loss of information. Dynamic treemaps visualize the biodiversity of metaproteome samples, which eases the exploration of samples with highly complex compositions. The potential of Unipept to gain novel insights into the biodiversity of a sample is evaluated by reanalyzing publicly available metaproteome data sets taken from the bacterial phyllosphere and the human gut. KEYWORDS: metaproteomics, tryptic peptides, biodiversity analysis, treemap visualization INTRODUCTION The introduction of high-throughput sequencing methods made it possible to determine the diversity, phylogeny, and genomic repertoire of complex microbial communities such as the human gut microbiome. Recently, the Metahit consortium released metagenomic sequence information showing approx- imately 1,000 dierent species commonly found in fecal samples, on average accounting for half a million genes in addition to the human genome. 1 While metagenomics provides a wealth of information on the global gene content, understanding the actual functional contribution to nutrient conversion or immune system development of individual genes or organisms requires functional genomics tools. High quality multidimensional liquid chromatography in combination with shotgun tandem mass spectrometric methods are currently implemented to reveal the protein complement of the metagenome, providing information of the core functional components. 2,3 In a typical single species proteomics experi- ment, protein identication from shotgun MS/MS data of tryptic peptides relies on matching the spectra to in silico calculated spectral information from proteins predicted from isolate or metagenomic databases. Therefore, tryptic digests and fragmentation ions on all protein sequences available for this organism are simulated. In the worst case, protein identication can be based on cross-species identication, typically using a close homologue. In metaproteomics approaches, MS/MS-based identication is hampered by several aspects. A rst problem is the limited coverage of the curated protein databases, e.g., UniProtKB/ Swiss-Prot. 4 Ideally, a protein complement of a synthetic metagenomic database containing sequences from dierent metagenomics experiments covering a wide range of organisms expected in the environment of interest could be created. Metagenomic databases however are exponentially increasing, and naive six-frame translation and protein prediction would lead to a high false discovery rate or low protein identication eciency. Rooijers et al. 5 countered this problem by implementing an iterative workow combining the use of a dened synthetic metagenome and a non-annotated meta- genome repository. A more specic problem toward functional analysis of the metaproteome is the lack of connectivity of the tryptic peptides and the organism of origin. Many tryptic peptide sequences are conserved over dierent bacterial taxa and are therefore not informative to describe the taxonomic diversity or functional properties of the sample. Askenazi et al. 6 developed the Pep2Pro web service to identify taxon-specic peptides. However, they used a restricted denition of taxon-specicity by retaining only peptides unique to a single taxon as dened in the NCBI taxonomy. In this paper, we present Unipept (http://unipept.ugent.be), a web application that supports biodiversity analysis of large and complex metaproteome samples using tryptic peptide information obtained from shotgun MS/MS experiments. Its underlying index structure is designed to quickly retrieve all occurrences of a tryptic peptide in UniProtKB records. Taxon-specicity of the tryptic peptide is successively derived from these occurrences using a Received: June 27, 2012 Published: November 15, 2012 Article pubs.acs.org/jpr © 2012 American Chemical Society 5773 dx.doi.org/10.1021/pr300576s | J. Proteome Res. 2012, 11, 5773-5780