Random Graph Models and Their Application to Protein-Protein Interaction Networks ∗ Desmond J. Higham † Nataˇsa Prˇzulj ‡ October 6, 2010 1 Background and Motivation Research in the analysis of large-scale biological data sets is a modern, evolv- ing topic, driven by recent technological advances in experimental techniques [7, 11, 34]. It presents many fascinating challenges and offers computational scientists the possibility of contributing directly to biological understanding and therapeutics. This chapter focuses on a class of data that is particularly straightforward when viewed from a mathematical perspective. However, when studying such data it is necessary to be aware of the many simplifi- cations and compromises that were involved in reaching such a streamlined summary of biological reality. A protein-protein interaction (PPI) network takes the form of a large, sparse and undirected graph. Nodes represent proteins and edges represent observed physical interaction between protein pairs. Figure 1 gives a simple schematic picture of how a set of physical interactions might arise between eight types of protein, labelled A to H. Figure 2 then shows the resulting (unweighted, undirected) PPI network. More realistically, for a particular organism a PPI network will involve several thousand proteins, and tens of * This manuscript appears as University of Strathclyde Mathematics and Statistics Re- search Report 18 (2010). It has been written in response to the invitation to prepare a chapter for the Handbook of Statistical Systems Biology, edited by David Balding, Mark Girolami and Michael Stumpf, to be published by Wiley. † Department of Mathematics and Statistics, University of Strathclyde, UK ‡ Department of Computing, Imperial College London, UK 1