184 BMB reports http://bmbreports.org *Corresponding author. Tel: 82-54-279-2393; Fax: 82-54-279-8409; E-mail: Dhhwang@postech.ac.kr # These authors equally contributed to this work. Received 25 February 2008 Keywords: Data integration, Network analysis, Network modeling, Proteomics, Systems biology From proteomics toward systems biology: integration of different types of proteomics data into network models Sangchul Rho # , Sungyong You # , Yongsoo Kim # & Daehee Hwang * School of Interdisciplinary Bioscience and Bioengineering, Pohang University of Science and Technology, Pohang, Korea Living organisms are comprised of various systems at different levels, i.e., organs, tissues, and cells. Each system carries out its diverse functions in response to environmental and genetic perturbations, by utilizing biological networks, in which nodal components, such as, DNA, mRNAs, proteins, and metabo- lites, closely interact with each other. Systems biology inves- tigates such systems by producing comprehensive global data that represent different levels of biological information, i.e., at the DNA, mRNA, protein, or metabolite levels, and by in- tegrating this data into network models that generate coherent hypotheses for given biological situations. This review presents a systems biology framework, called the ‘Integrative Proteomics Data Analysis Pipeline’ (IPDAP), which generates mechanistic hypotheses from network models reconstructed by integrating diverse types of proteomic data generated by mass spectrometry-based proteomic analyses. The devised framework includes a serial set of computational and network analysis tools. Here, we demonstrate its functionalities by ap- plying these tools to several conceptual examples. [BMB re- ports 2008; 41(3): 184-193] INTRODUCTION Mammalian organisms consist of various systems, i.e., bodies, organs, tissues, cells, and subcellular compartments, and these systems carry out diverse fundamental activities by signaling different biological networks defined by their nodes (e.g., DNA, mRNAs, proteins, and metabolites in cellular systems) and edges (the various interactions between nodes). When per- turbed by an environmental or genetic event (e.g., diseases, nutrient changes, exposures to pathogens and harmful sub- stances), one or more networks or particular portions of net- works, called network modules, become activated to execute appropriate functions. Malfunctions of these networks or mod- ules result in failures to respond appropriately, and thus give rise to diseases. The advances in high-throughput technologies that facilitate the probing of both nodes (e.g., abundances of mRNAs, pro- teins, and metabolites, and types and degrees of post-transla- tional modifications-PTMs) and edges (e.g., protein-protein, protein-DNA, chemical-protein, and chemical-DNA inter- actions; abbreviated to PPIs, PDIs, CPIs, and CDIs, re- spectively) in biological networks offer new opportunities to understand how living organisms execute necessary functions at system levels via complex networking operations (1, 2). These systems biology approaches to the understanding of complex network operations typically involve the following three cardinal processes (3): 1) the generation of global data af- ter a perturbation (e.g., data on mRNA abundances and signal- ing pathway phosphorylation in diseased systems), 2) the in- tegration of such information into network models that de- scribe key biological events arising from perturbations (e.g., abnormal signaling in lung cancer systems), and 3) the gen- eration of experimentally testable hypotheses concerning the mechanisms underlying key processes (e.g., mechanisms asso- ciated with disease initiation and progression). These three processes are achieved by identifying key network modules and exploring their dynamic transitions after perturbations. Over the past decade proteomics studies have been consid- ered to be of central importance to biologic system studies, and have enhanced our knowledge of the functions of bio- logical networks (e.g., signaling, and gene and metabolic regu- lation) by generating a tremendous amount of information on; cellular states (at different levels), genomics (e.g., DNA se- quencing and ChIP-sequencing), and transcriptomics (e.g., mi- croarrays and Serial Analysis of Gene Expression-SAGE). These proteomic technologies can be largely categorized as being antibody or mass spectrometry (MS) based (4, 5). The latter pro- vides a wider spectrum of information than the former on the functional states of proteins, and on their abundances, mod- ifications, and interactions at different levels. Thus, proteomics has played crucial roles in systems biology research (2, 6) by providing valuable sets of information for integrative network modeling at system levels. However, despites these advan- tages, proteomics technologies also suffer from undersampling issues related to small detected proteome sizes, i.e., they are Mini Review