Assignment of protein function in the postgenomic era Alan Saghatelian & Benjamin F Cravatt Genome sequencing projects have provided researchers with an unprecedented boon of molecular information that promises to revolutionize our understanding of life and lead to new treatments of its disorders. However, genome sequences alone offer only limited insights into the biochemical pathways that determine cell and tissue function. These complex metabolic and signaling networks are largely mediated by proteins. The vast number of uncharacterized proteins found in prokaryotic and eukaroyotic systems suggests that our knowledge of cellular biochemistry is far from complete. Here, we highlight a new breed of ‘postgenomic’ methods that aim to assign functions to proteins through the integrated application of chemical and biological techniques. One of the preeminent challenges facing twenty-first century scientists is the determination of the molecular, cellular and (patho)physiological functions for the numerous proteins encoded by eukaryotic and pro- karyotic genomes. It is clear that this problem cannot be addressed by the analysis of genome sequences alone. Indeed, a significant frac- tion of every genome sequenced to date, from bacterial to human, is composed of uncharacterized proteins. Even those proteins for which tentative or apparently robust assignments of activity can be made are susceptible to reinterpretation as our understanding of the molecular complexity of life increases. For example, it is now clear that proteins with highly related sequences can perform different functions in vivo, and, conversely, proteins may show similar activities while lacking dis- cernible sequence or structural similarity 1 . The additional realization that one gene can code for tens if not hundreds of different proteins as a result of post-transcriptional and post-translational processing and modification indicates that the number of proteins in need of characterization vastly exceeds the number of genes in the genome 2 . These various issues combine to present a formidable, but exciting, set of experimental problems for contemporary biological and chemical researchers, all of which distill down to predominantly a single ques- tion: how should we undertake the assignment of protein function in the postgenomic era? The biochemical properties of proteins are typically determined in vitro with purified material. Although this classical ‘test tube bio- chemistry’ approach has succeeded in explaining the activities of many proteins, it does suffer some shortcomings. First, proteins do not func- tion in isolation in vivo, but rather as parts of complex metabolic and signaling networks. Proteins are also regulated by post-translational mechanisms in vivo, including covalent modification and protein- protein interactions (Fig. 1). These dynamic events create a context dependency to the performance of proteins in living systems that can be difficult, if not impossible, to replicate in vitro. From a methodological perspective, in vitro studies are biased toward proteins that are more straightforward to express recombinantly and for which activity assays are available. As such, many challenging classes of proteins, including membrane-associated and uncharacterized proteins, are not effectively addressed by classical methods. Finally, the examination of individual proteins ‘one at a time’ is an uncomfortably slow process, especially when confronted with the thousands of proteins in current need of functional characterization. The limitations of classical biochemical methods have inspired the development of tools and technologies that can, in a direct and systematic way, characterize the activities of proteins in complex bio- logical settings. Among these ‘systems biology’ approaches, genetic techniques, such as targeted gene disruption (gene ‘knockouts’ 3 ) and RNA interference (RNAi) 4 , have proven particularly powerful, owing to their remarkable specificity and generality (Fig. 2). Nonetheless, genetic methods suffer from a lack of temporal control over protein activity and are not well suited for dissecting the functions of dif- ferent protein isoforms. The subject of this review is a distinct breed of systems biology methods that take advantage of chemistry, often in combination with genetic, protein biochemistry and analytical techniques, to tackle the problem of characterizing protein function on a global scale (Fig. 2). As will be seen, a common trait of these methods is that they are truly ‘postgenomic’ in the sense that they require complete genome sequences for optimal performance. Thus, although the limitations of pure reliance on genome sequences will emerge as a recurring theme, these comments should not be viewed as a negative critique of genome sequencing efforts. On the contrary, the chemical biology approaches described herein provide perhaps the most profound endorsement of the value of complete genome sequences, which are enabling us to make experimental inquiries never before deemed possible. Genomic methods for the assignment of protein function Prokaryotic organisms provide perhaps the richest and most diverse source of metabolic enzymes on the planet. These enzymes often The Skaggs Institute for Chemical Biology and Departments of Cell Biology and Chemistry, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, California 92037, USA. Correspondence should be addressed to B.F.C. (cravatt@scripps.edu). 130 VOLUME 1 NUMBER 3 AUGUST 2005 NATURE CHEMICAL BIOLOGY REVIEW © 2005 Nature Publishing Group http://www.nature.com/naturechemicalbiology