Random Graph Models and Their Application to Protein-Protein Interaction Networks Desmond J. Higham Nataˇsa Prˇzulj October 6, 2010 1 Background and Motivation Research in the analysis of large-scale biological data sets is a modern, evolv- ing topic, driven by recent technological advances in experimental techniques [7, 11, 34]. It presents many fascinating challenges and offers computational scientists the possibility of contributing directly to biological understanding and therapeutics. This chapter focuses on a class of data that is particularly straightforward when viewed from a mathematical perspective. However, when studying such data it is necessary to be aware of the many simplifi- cations and compromises that were involved in reaching such a streamlined summary of biological reality. A protein-protein interaction (PPI) network takes the form of a large, sparse and undirected graph. Nodes represent proteins and edges represent observed physical interaction between protein pairs. Figure 1 gives a simple schematic picture of how a set of physical interactions might arise between eight types of protein, labelled A to H. Figure 2 then shows the resulting (unweighted, undirected) PPI network. More realistically, for a particular organism a PPI network will involve several thousand proteins, and tens of * This manuscript appears as University of Strathclyde Mathematics and Statistics Re- search Report 18 (2010). It has been written in response to the invitation to prepare a chapter for the Handbook of Statistical Systems Biology, edited by David Balding, Mark Girolami and Michael Stumpf, to be published by Wiley. Department of Mathematics and Statistics, University of Strathclyde, UK Department of Computing, Imperial College London, UK 1