Technical note The different proteomes of Chlamydomonas reinhardtii Luis Valledor, Luis Recuenco-Munoz, Volker Egelhofer, Stefanie Wienkoop, Wolfram Weckwerth Department of Molecular Systems Biology, University of Vienna, Althanstrasse 14, 1090, Vienna, Austria ARTICLE INFO ABSTRACT Article history: Received 23 April 2012 Accepted 30 July 2012 Available online 7 August 2012 Protein identification and proteome mapping mostly rely on the combination of tandem mass spectrometry and sequence database searching. Despite constant improvements achieved in instrumentation, search algorithms, and genome annotations, little effort has been invested in estimating the impact of different genome annotation releases on the final results of a proteome study. We have used a large dataset of mass spectra obtained using an Orbitrap LTQ XL instrument, covering different growth situations of the model species Chlamydomonas reinhardtii. More than one million spectra were analyzed employing the SEQUEST algorithm and four different databases corresponding to the major Chlamydomonas genome assemblies. In total more than 3000 proteins and about 11,000 peptides were identified. 238 proteins were exclusively detected in assembly 3.0 in contrast to 1222 missing proteins only detectable in other databases. The comparison of the results demonstrates that the database selection affects not only the number of identified proteins but also label free quantitation and the biological interpretation of the results. Lists of protein accessions exclusively assigned to individual C. reinhardtii genome assemblies and annotations are provided as a resource for proteogenomic studies. © 2012 Elsevier B.V. All rights reserved. Keywords: Proteogenomics Genome annotation Functional annotation Systems biology Plant systems biology PROMEX Nowadays the identification of peptides and the mapping of the proteomes rely on the combination of tandem mass spectrom- etry and sequence database searching. In a typical proteomics pipeline the identification of proteins is based on peptide-centric proteomics, that identifies peptides rather than proteins [1]. Peptides are identified by matching the acquired MS/MS spectra against a protein sequence database, while proteins are inferred after peptide identification. A multitude of search engines are available for identifying the different peptides in the sample, and new tools are constantly designed for improving the quality of the analysis in terms of increasing the positive identifications while reducing the number of false positives [2]. However, in spite of these advances, the importance of the employed database has not been specifically addressed in most cases. The contents of the database are paramount for protein identification. The first step of protein identification is the in silico digestion of all of the sequences of the database, generating theoretical MS/MS spectra for every possible candidate. The generated spectra are then compared to each experimental spectrum and a score is calculated for each peptide [3]. In a second step, the set of all identified peptides is compared to the undigested protein database and used to infer which proteins may have been present [4]. Consequently, the anno- tated genome database is the foundation of the whole process including peptide identification and assembly of the corre- sponding proteins. Large sequence databases containing sequences of different species, like NCBI or Uniprot, are classically used for protein JOURNAL OF PROTEOMICS 75 (2012) 5883 5887 Corresponding author at: Department of Molecular Systems Biology, Faculty of Life Sciences, University of Vienna, Althanstr. 14, 1090, Vienna, Austria. Tel.: +43 1 4277 577 00, +43 664 60277 577 00 (mobile); fax: +43 1 4277 9 577. E-mail addresses: luis.valledor@univie.ac.at (L. Valledor), wolfram.weckwerth@univie.ac.at (W. Weckwerth). 1874-3919/$ see front matter © 2012 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.jprot.2012.07.045 Available online at www.sciencedirect.com www.elsevier.com/locate/jprot