Technical note
The different proteomes of Chlamydomonas reinhardtii
Luis Valledor, Luis Recuenco-Munoz, Volker Egelhofer,
Stefanie Wienkoop, Wolfram Weckwerth
⁎
Department of Molecular Systems Biology, University of Vienna, Althanstrasse 14, 1090, Vienna, Austria
ARTICLE INFO ABSTRACT
Article history:
Received 23 April 2012
Accepted 30 July 2012
Available online 7 August 2012
Protein identification and proteome mapping mostly rely on the combination of tandem mass
spectrometry and sequence database searching. Despite constant improvements achieved in
instrumentation, search algorithms, and genome annotations, little effort has been invested
in estimating the impact of different genome annotation releases on the final results of a
proteome study. We have used a large dataset of mass spectra obtained using an Orbitrap LTQ
XL instrument, covering different growth situations of the model species Chlamydomonas
reinhardtii. More than one million spectra were analyzed employing the SEQUEST algorithm
and four different databases corresponding to the major Chlamydomonas genome assemblies.
In total more than 3000 proteins and about 11,000 peptides were identified. 238 proteins were
exclusively detected in assembly 3.0 in contrast to 1222 missing proteins only detectable in
other databases. The comparison of the results demonstrates that the database selection
affects not only the number of identified proteins but also label free quantitation and the
biological interpretation of the results. Lists of protein accessions exclusively assigned to
individual C. reinhardtii genome assemblies and annotations are provided as a resource for
proteogenomic studies.
© 2012 Elsevier B.V. All rights reserved.
Keywords:
Proteogenomics
Genome annotation
Functional annotation
Systems biology
Plant systems biology
PROMEX
Nowadays the identification of peptides and the mapping of the
proteomes rely on the combination of tandem mass spectrom-
etry and sequence database searching. In a typical proteomics
pipeline the identification of proteins is based on peptide-centric
proteomics, that identifies peptides rather than proteins [1].
Peptides are identified by matching the acquired MS/MS spectra
against a protein sequence database, while proteins are inferred
after peptide identification. A multitude of search engines are
available for identifying the different peptides in the sample, and
new tools are constantly designed for improving the quality of
the analysis in terms of increasing the positive identifications
while reducing the number of false positives [2]. However, in spite
of these advances, the importance of the employed database has
not been specifically addressed in most cases.
The contents of the database are paramount for protein
identification. The first step of protein identification is the in silico
digestion of all of the sequences of the database, generating
theoretical MS/MS spectra for every possible candidate. The
generated spectra are then compared to each experimental
spectrum and a score is calculated for each peptide [3]. In a
second step, the set of all identified peptides is compared to
the undigested protein database and used to infer which
proteins may have been present [4]. Consequently, the anno-
tated genome database is the foundation of the whole process
including peptide identification and assembly of the corre-
sponding proteins.
Large sequence databases containing sequences of different
species, like NCBI or Uniprot, are classically used for protein
JOURNAL OF PROTEOMICS 75 (2012) 5883 – 5887
⁎ Corresponding author at: Department of Molecular Systems Biology, Faculty of Life Sciences, University of Vienna, Althanstr. 14, 1090,
Vienna, Austria. Tel.: +43 1 4277 577 00, +43 664 60277 577 00 (mobile); fax: +43 1 4277 9 577.
E-mail addresses: luis.valledor@univie.ac.at (L. Valledor), wolfram.weckwerth@univie.ac.at (W. Weckwerth).
1874-3919/$ – see front matter © 2012 Elsevier B.V. All rights reserved.
http://dx.doi.org/10.1016/j.jprot.2012.07.045
Available online at www.sciencedirect.com
www.elsevier.com/locate/jprot