articles The completion and near completion of the sequencing phase of genome projects has ushered in the age of proteomics, the study of all gene products in an organism. This flood of sequence information coupled with recent advances in molec- ular and structural biology have led to the concept of ‘struc- tural proteomics’ or ‘structural genomics’, the determination of three-dimensional protein structures on a genome-wide scale. An important use of three-dimensional structural infor- mation of proteins is to uncover clues as to a protein’s function that are not detectable from sequence analysis 1,2 . This applica- tion of structural proteomics is driven by the realization that <30% of all predicted eukaryotic proteins have a known func- tion. A related use of structural proteomics information is to determine a sufficient number of three-dimensional structures necessary to define a ‘basic parts list’ of protein folds 3,4 . Most other structures could then be modeled from this basis set using computational techniques 3,5 . The long term goal is to determine experimental structures for all proteins because it is the subtle differences in protein structure that contribute to the diversity and complexity of life, and current modeling techniques are not yet accurate enough to reveal these sub- tleties 6 . As reported in this manuscript, we initiated a prototype structural proteomics study of 424 nonmembrane proteins from the proteome of Methanobacterium thermoautotrophicum H ( M.th.). The primary goals of this research are to evaluate the technical hurdles involved in such a high throughput pro- ject, to estimate the percentage of proteins encoded by a genome that are immediately amenable to structure analysis, and to assess the extent to which function can be inferred from structure. nature structural biology • volume 7 number 10 • october 2000 903 Target selection M.th. is a thermophilic archaeon whose genome comprises 1,871 open reading frames (ORFs) 7 . Archaeal proteins share many sequence and functional features with eukaryotic proteins, but are often smaller and more robust, and thus serve as excellent model systems for complex processes. Only two exclusionary cri- teria were implemented in our target selection scheme (Fig. 1). First, membrane associated proteins, which comprise 30% (267–422 of 1,871 ORFs) of the M.th. proteome, were excluded. Although this class of proteins is of great biological significance, the science of membrane protein structure determination has not yet progressed to the point at which one would consider high throughput approaches. Second, proteins that had clear homologs in the Protein Data Bank (PDB) were excluded ( 27% 1 Ontario Cancer Institute and Department of Medical Biophysics, University of Toronto 610 University Ave, Toronto, Ontario, Canada M5G 2M9. 2 These two authors contributed equally to this work. 3 Present address: Integrative Proteomics Inc., Toronto, Ontario, Canada. 4 Department of Molecular Biophysics and Biochemistry and Computer Science, PO Box 208114, Yale University, New Haven, Connecticut 06520, USA. 5 Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratories, EMSL 2569 K8-98, Richland, Washington 99352, USA. 6 Departments of Biochemistry and Molecular Biology, Chemistry, and the Biotechnology Laboratory, 2146 Health Sciences Mall, University of British Columbia, Vancouver, British Columbia, Canada V6T 1Z3. 7 Biotechnology Research Institute, National Research Council of Canada, 6100 Royalmount Ave., Montreal, Quebec, Canada H4P 2R2. 8 Department of Biochemistry and Montreal Joint Centre for Structural Biology, McGill University, 3655 Promenade Sir William Osler, Montreal, Quebec, Canada, H3G 1Y6. 9 Department of Biochemistry and 10 Department of Molecular and Medical Genetics, University of Toronto, 1 Kings College Circle, Toronto, Ontario, Canada M5S 1A8. 11 Banting and Best Department of M edical Research, C.H. Best Institute, University of Toronto, 112 College St., Toronto, Ontario, Canada M5G 1L6. Correspondence should be addressed to C.H.A. email: carrow@oci.utoronto.ca or A.M.E. email: aled.edwards@utoronto.ca Structural proteomics of an archaeon Dinesh Christendat 1,2 , Adelinda Yee 1,2 , Akil Dharamsi 1,3 , Yuval Kluger 4 , Alexei Savchenko 1 , John R. Cort 5 , Valerie Booth 1 , Cameron D. M ackereth 6 , Vivian Saridakis 1 , Irena Ekiel 7 , Guennadi Kozlov 8 , Karen L. M axw ell 9 , Ning Wu 1 , Law rence P . McIntosh 6 , Kalle Gehring 8 , Michael A. Kennedy 5 , Alan R. Davidson 9,10 , Emil F. Pai 1,9,10 , Mark Gerstein 4 , Aled M. Edwards 1,11 and Cheryl H. Arrowsmith 1 A set of 424 nonmembrane proteins from M ethanobacterium thermoautotrophicum were cloned, expressed and purified for structural studies. Of these, 20% were found to be suitable candidates for X-ray crystallographic or NMR spectroscopic analysis without further optimization of conditions, providing an estimate of the number of the most accessible structural targets in the proteome. A retrospective analysis of the experimental behavior of these proteins suggested some simple relations between sequence and solubility, implying that data bases of protein properties will be useful in optimizing high throughput strategies. Of the first 10 structures determined, several provided clues to biochemical functions that were not detectable from sequence analysis, and in many cases these putative functions could be readily confirmed by biochemical methods. This demonstrates that structural proteomics is feasible and can play a central role in functional genomics. Fig. 1 M.th. target ORFs. A histogram representing the numbers of dif- ferent classes of M.th. ORFs according to predicted protein size showing unbiased sampling of nonmembrane proteins of unknown structure. © 2000 Nature America Inc. • http://structbio.nature.com © 2000 Nature America Inc. • http://structbio.nature.com