articles
The completion and near completion of the sequencing phase
of genome projects has ushered in the age of proteomics, the
study of all gene products in an organism. This flood of
sequence information coupled with recent advances in molec-
ular and structural biology have led to the concept of ‘struc-
tural proteomics’ or ‘structural genomics’, the determination
of three-dimensional protein structures on a genome-wide
scale. An important use of three-dimensional structural infor-
mation of proteins is to uncover clues as to a protein’s function
that are not detectable from sequence analysis
1,2
. This applica-
tion of structural proteomics is driven by the realization that
<30% of all predicted eukaryotic proteins have a known func-
tion. A related use of structural proteomics information is to
determine a sufficient number of three-dimensional structures
necessary to define a ‘basic parts list’ of protein folds
3,4
. Most
other structures could then be modeled from this basis set
using computational techniques
3,5
. The long term goal is to
determine experimental structures for all proteins because it is
the subtle differences in protein structure that contribute to
the diversity and complexity of life, and current modeling
techniques are not yet accurate enough to reveal these sub-
tleties
6
.
As reported in this manuscript, we initiated a prototype
structural proteomics study of 424 nonmembrane proteins
from the proteome of Methanobacterium thermoautotrophicum
∆H ( M.th.). The primary goals of this research are to evaluate
the technical hurdles involved in such a high throughput pro-
ject, to estimate the percentage of proteins encoded by a
genome that are immediately amenable to structure analysis,
and to assess the extent to which function can be inferred from
structure.
nature structural biology • volume 7 number 10 • october 2000 903
Target selection
M.th. is a thermophilic archaeon whose genome comprises 1,871
open reading frames (ORFs)
7
. Archaeal proteins share many
sequence and functional features with eukaryotic proteins, but
are often smaller and more robust, and thus serve as excellent
model systems for complex processes. Only two exclusionary cri-
teria were implemented in our target selection scheme (Fig. 1).
First, membrane associated proteins, which comprise ∼30%
(267–422 of 1,871 ORFs) of the M.th. proteome, were excluded.
Although this class of proteins is of great biological significance,
the science of membrane protein structure determination has
not yet progressed to the point at which one would consider high
throughput approaches. Second, proteins that had clear
homologs in the Protein Data Bank (PDB) were excluded ( ∼27%
1
Ontario Cancer Institute and Department of Medical Biophysics, University of Toronto 610 University Ave, Toronto, Ontario, Canada M5G 2M9.
2
These two authors
contributed equally to this work.
3
Present address: Integrative Proteomics Inc., Toronto, Ontario, Canada.
4
Department of Molecular Biophysics and Biochemistry and
Computer Science, PO Box 208114, Yale University, New Haven, Connecticut 06520, USA.
5
Environmental Molecular Sciences Laboratory, Pacific Northwest National
Laboratories, EMSL 2569 K8-98, Richland, Washington 99352, USA.
6
Departments of Biochemistry and Molecular Biology, Chemistry, and the Biotechnology
Laboratory, 2146 Health Sciences Mall, University of British Columbia, Vancouver, British Columbia, Canada V6T 1Z3.
7
Biotechnology Research Institute, National
Research Council of Canada, 6100 Royalmount Ave., Montreal, Quebec, Canada H4P 2R2.
8
Department of Biochemistry and Montreal Joint Centre for Structural
Biology, McGill University, 3655 Promenade Sir William Osler, Montreal, Quebec, Canada, H3G 1Y6.
9
Department of Biochemistry and
10
Department of Molecular and
Medical Genetics, University of Toronto, 1 Kings College Circle, Toronto, Ontario, Canada M5S 1A8.
11
Banting and Best Department of M edical Research, C.H. Best
Institute, University of Toronto, 112 College St., Toronto, Ontario, Canada M5G 1L6.
Correspondence should be addressed to C.H.A. email: carrow@oci.utoronto.ca or A.M.E. email: aled.edwards@utoronto.ca
Structural proteomics of an archaeon
Dinesh Christendat
1,2
, Adelinda Yee
1,2
, Akil Dharamsi
1,3
, Yuval Kluger
4
, Alexei Savchenko
1
,
John R. Cort
5
, Valerie Booth
1
, Cameron D. M ackereth
6
, Vivian Saridakis
1
, Irena Ekiel
7
, Guennadi Kozlov
8
,
Karen L. M axw ell
9
, Ning Wu
1
, Law rence P . McIntosh
6
, Kalle Gehring
8
, Michael A. Kennedy
5
,
Alan R. Davidson
9,10
, Emil F. Pai
1,9,10
, Mark Gerstein
4
, Aled M. Edwards
1,11
and Cheryl H. Arrowsmith
1
A set of 424 nonmembrane proteins from M ethanobacterium thermoautotrophicum were cloned, expressed and
purified for structural studies. Of these, ∼20% were found to be suitable candidates for X-ray crystallographic or
NMR spectroscopic analysis without further optimization of conditions, providing an estimate of the number of
the most accessible structural targets in the proteome. A retrospective analysis of the experimental behavior of
these proteins suggested some simple relations between sequence and solubility, implying that data bases of
protein properties will be useful in optimizing high throughput strategies. Of the first 10 structures determined,
several provided clues to biochemical functions that were not detectable from sequence analysis, and in many
cases these putative functions could be readily confirmed by biochemical methods. This demonstrates that
structural proteomics is feasible and can play a central role in functional genomics.
Fig. 1 M.th. target ORFs. A histogram representing the numbers of dif-
ferent classes of M.th. ORFs according to predicted protein size showing
unbiased sampling of nonmembrane proteins of unknown structure.
© 2000 Nature America Inc. • http://structbio.nature.com
© 2000 Nature America Inc. • http://structbio.nature.com