Assignment of protein function in the
postgenomic era
Alan Saghatelian & Benjamin F Cravatt
Genome sequencing projects have provided researchers with an unprecedented boon of molecular information that promises to
revolutionize our understanding of life and lead to new treatments of its disorders. However, genome sequences alone offer only
limited insights into the biochemical pathways that determine cell and tissue function. These complex metabolic and signaling
networks are largely mediated by proteins. The vast number of uncharacterized proteins found in prokaryotic and eukaroyotic
systems suggests that our knowledge of cellular biochemistry is far from complete. Here, we highlight a new breed of ‘postgenomic’
methods that aim to assign functions to proteins through the integrated application of chemical and biological techniques.
One of the preeminent challenges facing twenty-first century scientists
is the determination of the molecular, cellular and (patho)physiological
functions for the numerous proteins encoded by eukaryotic and pro-
karyotic genomes. It is clear that this problem cannot be addressed
by the analysis of genome sequences alone. Indeed, a significant frac-
tion of every genome sequenced to date, from bacterial to human, is
composed of uncharacterized proteins. Even those proteins for which
tentative or apparently robust assignments of activity can be made are
susceptible to reinterpretation as our understanding of the molecular
complexity of life increases. For example, it is now clear that proteins
with highly related sequences can perform different functions in vivo,
and, conversely, proteins may show similar activities while lacking dis-
cernible sequence or structural similarity
1
. The additional realization
that one gene can code for tens if not hundreds of different proteins
as a result of post-transcriptional and post-translational processing
and modification indicates that the number of proteins in need of
characterization vastly exceeds the number of genes in the genome
2
.
These various issues combine to present a formidable, but exciting, set
of experimental problems for contemporary biological and chemical
researchers, all of which distill down to predominantly a single ques-
tion: how should we undertake the assignment of protein function in
the postgenomic era?
The biochemical properties of proteins are typically determined
in vitro with purified material. Although this classical ‘test tube bio-
chemistry’ approach has succeeded in explaining the activities of many
proteins, it does suffer some shortcomings. First, proteins do not func-
tion in isolation in vivo, but rather as parts of complex metabolic and
signaling networks. Proteins are also regulated by post-translational
mechanisms in vivo, including covalent modification and protein-
protein interactions (Fig. 1). These dynamic events create a context
dependency to the performance of proteins in living systems that can be
difficult, if not impossible, to replicate in vitro. From a methodological
perspective, in vitro studies are biased toward proteins that are more
straightforward to express recombinantly and for which activity assays
are available. As such, many challenging classes of proteins, including
membrane-associated and uncharacterized proteins, are not effectively
addressed by classical methods. Finally, the examination of individual
proteins ‘one at a time’ is an uncomfortably slow process, especially
when confronted with the thousands of proteins in current need of
functional characterization.
The limitations of classical biochemical methods have inspired
the development of tools and technologies that can, in a direct and
systematic way, characterize the activities of proteins in complex bio-
logical settings. Among these ‘systems biology’ approaches, genetic
techniques, such as targeted gene disruption (gene ‘knockouts’
3
) and
RNA interference (RNAi)
4
, have proven particularly powerful, owing
to their remarkable specificity and generality (Fig. 2). Nonetheless,
genetic methods suffer from a lack of temporal control over protein
activity and are not well suited for dissecting the functions of dif-
ferent protein isoforms. The subject of this review is a distinct breed
of systems biology methods that take advantage of chemistry, often
in combination with genetic, protein biochemistry and analytical
techniques, to tackle the problem of characterizing protein function
on a global scale (Fig. 2). As will be seen, a common trait of these
methods is that they are truly ‘postgenomic’ in the sense that they
require complete genome sequences for optimal performance. Thus,
although the limitations of pure reliance on genome sequences will
emerge as a recurring theme, these comments should not be viewed
as a negative critique of genome sequencing efforts. On the contrary,
the chemical biology approaches described herein provide perhaps
the most profound endorsement of the value of complete genome
sequences, which are enabling us to make experimental inquiries
never before deemed possible.
Genomic methods for the assignment of protein function
Prokaryotic organisms provide perhaps the richest and most diverse
source of metabolic enzymes on the planet. These enzymes often
The Skaggs Institute for Chemical Biology and Departments of Cell
Biology and Chemistry, The Scripps Research Institute, 10550 North
Torrey Pines Road, La Jolla, California 92037, USA. Correspondence
should be addressed to B.F.C. (cravatt@scripps.edu).
130 VOLUME 1 NUMBER 3 AUGUST 2005 NATURE CHEMICAL BIOLOGY
REVIEW
© 2005 Nature Publishing Group http://www.nature.com/naturechemicalbiology