3036 Mol. BioSyst., 2012, 8, 3036–3048 This journal is c The Royal Society of Chemistry 2012 Cite this: Mol. BioSyst., 2012, 8, 3036–3048 Detecting protein complexes in a PPI network: a gene ontology based multi-objective evolutionary approachw Anirban Mukhopadhyay, Sumanta Ray* and Moumita De Received 27th March 2012, Accepted 14th August 2012 DOI: 10.1039/c2mb25302j Protein complexes play an important role in cellular mechanism. Identification of protein complexes in protein–protein interaction (PPI) networks is the first step in understanding the organization and dynamics of cell function. Several high-throughput experimental techniques produce a large amount of protein interactions, which can be used to predict protein complexes in a PPI network. We have developed an algorithm PROCOMOSS (Protein Complex Detection using Multi-objective Evolutionary Approach based on Semantic Similarity) for partitioning the whole PPI network into clusters, which serve as predicted protein complexes. We consider both graphical properties of a PPI network as well as biological properties based on GO semantic similarity measure as objective functions. Here three different semantic similarity measures are used for grouping functionally similar proteins in the same clusters. We have applied the PROCOMOSS algorithm on two different datasets of Saccharomyces cerevisiae to find and predict protein complexes. A real-life application of the PROCOMOSS is also shown here by applying it in the human PPI network consisting of differentially expressed genes affected by gastric cancer. Gene ontology and pathway based analyses are also performed to investigate the biological importance of the extracted gene modules. 1 Introduction A PPI network can be described as a complex system of proteins linked by interactions. The simplest representation takes the form of an undirected graph consisting of nodes and edges, 1 where proteins are represented as nodes and the interaction of two proteins is represented as adjacent nodes connected by an edge. The protein complexes in a PPI network are assemblages of proteins that interact with each other at a given time and place, forming a dense region in the PPI networks. Several techniques based on graph clustering, finding dense regions, or clique finding have been proposed to discover protein complexes in PPI networks. 2–5 Molecular Complex Detection (MCODE), proposed by Bader et al., 6 detects densely connected regions in the PPI network by giving weight to each vertex, corresponding to its local neighborhood den- sity. Then, starting with the top weighted vertex (seed vertex), it includes the vertices whose weight is above a given threshold in the cluster, recursively. The Markov Cluster algorithm (MCL) proposed in ref. 7 converges toward a partitioning of the graph, with a set of high-flow regions (the clusters) separated by boundaries with no flow. In ref. 8 Restricted Neighborhood Search Clustering (RNSC), a cost-based local search algorithm is proposed that explores the solution space to minimize a cost function calculated according to the number of intra-cluster and inter-cluster edges. Starting from an initial random solution, RNSC iteratively moves a vertex from one cluster to another if this move reduces the general cost. Recently in ref. 9 a clustering with overlapping neighborhood expansion (ClusterONE) has been introduced for detecting potentially overlapping protein complexes from protein–protein interaction data. This algorithm consists of three major steps: first, starting from a single seed vertex, a greedy procedure adds or removes vertices to find groups with high cohesiveness. In the second step, merging between each pair of groups is done based on the extent of overlap between each pair of groups for which the overlap score is above a specified threshold. In the third step, a postprocessing is done by discarding complex candidates that contain less than three proteins or whose density is below a given threshold. In general it has been observed that the proteins constituting a complex are functionally similar and they carry out some common biological activity. Motivated by this observation, in this article a multi-objective algorithm PROCOMOSS (Protein Complex Detection using Multi-objective Evolutionary Approach based on Semantic Similarity) is developed. PROCOMOSS optimizes both graph based density metric and GO-semantic similarity based metric simultaneously to find dense protein Department of Computer Science and Engineering, University of Kalyani, Kalyani, India. E-mail: anirban@klyuniv.ac.in, sumanta_ray86@rediffmail.com, moumita.de2013@gmail.com w Electronic supplementary information (ESI) available: The code and other related materials are available at http://kucse.in/procomoss/. See DOI: 10.1039/c2mb25302j Molecular BioSystems Dynamic Article Links www.rsc.org/molecularbiosystems PAPER