LC–MS/MS based proteomic analysis and functional inference of hypothetical proteins in Desulfovibrio vulgaris Weiwen Zhang a, * , David E. Culley a , Marina A. Gritsenko b , Ronald J. Moore b , Lei Nie c , Johannes C.M. Scholten a , Konstantinos Petritis b , Eric F. Strittmatter b , David G. Camp II b , Richard D. Smith b , Fred J. Brockman a a Microbiology Group, Pacific Northwest National Laboratory, 902 Battelle Boulevard, P.O. Box 999, Richland, WA 99352, USA b Biological Systems Analysis and Mass Spectrometry Group, Pacific Northwest National Laboratory, 902 Battelle Boulevard, P.O. Box 999, Richland, WA 99352, USA c Department of Biostatistics, Biomathematics, and Bioinformatics, Georgetown University, 4000 Reservoir road, Washington, DC 20057, USA Received 1 September 2006 Available online 15 September 2006 Abstract High efficiency capillary liquid chromatography–tandem mass spectrometry (LC–MS/MS) was used to examine the proteins extracted from Desulfovibrio vulgaris cells across six treatment conditions. While our previous study provided a proteomic overview of the cellular metabolism based on proteins with known functions [W. Zhang, M.A. Gritsenko, R.J. Moore, D.E. Culley, L. Nie, K. Petritis, E.F. Strittmatter, D.G. Camp II, R.D. Smith, F.J. Brockman, A proteomic view of the metabolism in Desulfovibrio vulgaris determined by liquid chromatography coupled with tandem mass spectrometry, Proteomics 6 (2006) 4286–4299], this study describes the global detection and functional inference for hypothetical D. vulgaris proteins. Using criteria that a given peptide of a protein is identified from at least two out of three independent LC–MS/MS measurements and that for any protein at least two different peptides are identified among the three measurements, 129 open reading frames (ORFs) originally annotated as hypothetical proteins were found to encode expressed proteins. Functional inference for the conserved hypothetical proteins was performed by a combination of several non-homol- ogy based methods: genomic context analysis, phylogenomic profiling, and analysis of a combination of experimental information, including peptide detection in cells grown under specific culture conditions and cellular location of the proteins. Using this approach we were able to assign possible functions to 20 conserved hypothetical proteins. This study demonstrated that a combination of proteo- mics and bioinformatics methodologies can provide verification of the expression of hypothetical proteins and improve genome annotation. Ó 2006 Published by Elsevier Inc. Keywords: Mass spectrometry; Hypothetical proteins; Function; Desulfovibrio vulgaris Due to improvements in DNA sequencing technologies more than 300 microbial genomes from almost all known major phylogenetic lineages have been fully sequenced, and many more are nearing completion. A crucial step in genome sequencing projects involves assigning functions to open reading frames (ORFs). Currently, the prediction of protein function is based primarily on sequence homol- ogy [1]. Using homology search methodologies functional assignments for over 60–70% of predicted proteins have been achieved for some genomes. The remaining genes are either homologous to previously identified genes with unknown function, or do not have any known homologs. Those proteins are generally referred to as ‘‘hypothetical proteins’’. A recent survey of 120 genomes showed that one out of three proteins in the NCBI protein database has now been annotated as hypothetical [2]. The elucida- tion of the function of these hypothetical proteins has 0006-291X/$ - see front matter Ó 2006 Published by Elsevier Inc. doi:10.1016/j.bbrc.2006.09.019 * Corresponding author. Fax: +1 509 376 1632. E-mail address: Weiwen.Zhang@pnl.gov (W. Zhang). www.elsevier.com/locate/ybbrc Biochemical and Biophysical Research Communications 349 (2006) 1412–1419 BBRC