Mini Review Evaluation of different domain-based methods in protein interaction prediction Hung Xuan Ta * , Liisa Holm Institute of Biotechnology, PO Box 56, University of Helsinki, 00014 Helsinki, Finland article info Article history: Received 18 September 2009 Available online 2 October 2009 Keywords: Protein–protein interactions Domain–domain interactions Association method Maximum likelihood estimation Parsimonious explanation abstract Protein–protein interactions (PPIs) play an important role in many biological functions. PPIs typically involve binding between domains, the basic units of protein folding, evolution and function. Identifying domain–domain interactions (DDIs) would aid understanding PPI networks. Recently, many computational methods aimed to infer DDIs from databases of interacting proteins and subsequently used the inferred DDIs to predict new PPIs. We attempt to describe systematically current domain-based approaches including the association method, maximum likelihood estimation and parsimonious explanation method. The perfor- mance of these methods at inferring DDIs and predicting PPIs was evaluated comparatively. We observe that each method generates artefacts in certain situations and discuss biases in the available benchmark sets. Ó 2009 Elsevier Inc. All rights reserved. Introduction Proteins always carry out their functions by interacting with each other to keep cells functioning [1]. Unraveling PPIs, one of the central goals in proteomics, will decipher the molecular mech- anisms underlying the biological functions and, then, help to enhance the approaches for drug discovery. Although many exper- imental approaches including yeast-two-hybrid [2,3] and mass spectroscopy methods [4–7] determined a huge amount of PPIs, the PPI datasets suffer from a weak overlap and relatively high level of false positives [8,9]. Moreover, these experimental tech- niques are expensive, time-consuming and labor-intensive. Computational methods have been developed to exploit and extend the protein interactomes. One category of them includes the approaches based on evolutionarily and structurally conserved building blocks of proteins termed domains [10,11]. They adopt the same assumptions that PPIs are mediated by domains and all the members of a domain class behave the same. These approaches infer potential DDIs relied on a training set of PPIs (deconstruction phase) and then, use these potential DDIs to predict PPIs in testing sets (prediction phase). More abstractly, observed interactions can be generalized by a mapping to protein classiﬁcation (upcasting), and then new interactions can be inferred between all the members of the interacting classes (downcasting) [12]. The key assumption is that the interaction property is conserved within a class. If the class is too large, downcasting generates potentially gross over-prediction. Sprinzak and Margalit used the association (AS) method to ﬁnd out the pairs of correlated sequence-signatures (domains) which co-occur in the PPIs more frequently than by chance [13]. In 2002, Kim et al. developed a statistical score system which is de- rived from the occurrence frequency of domain pairs to infer the DDIs from a set of observed PPIs [14]. The association approach was modiﬁed by integrating multiple data sources [15] or by con- sidering domain combinations pairs instead of single-domain pairs [16]. In another direction, some studies used Maximum Likelihood Estimation (MLE) technique [17] or its modiﬁcations [18–21] to calculate the interaction probability for all the possible domain pairs in an observed PPI dataset. Guimaraes et al. used a parsimony explanation (PE) approach formulated by linear programming (LP) to derive the statistical scores for the DDIs [22]. In this review, we systematically evaluate the AS, MLE and PE methods at inferring DDIs and predicting PPIs. We point out arte- facts of each method and discuss biases in the available benchmark sets. This paper is organized as follows. The section ‘Materials and methods’ deﬁnes benchmark datasets and reviews the AS, MLE and PE methods. The section ‘Results’ compares these methods on benchmarks. This section assesses the feasibility of domain decom- position approach and shows conﬂicting results for DDI versus PPI benchmarks. The reasons are discussed in the section ‘Discussion’. Materials and methods Domain decomposition All the methods were implemented in-house and trained and evaluated using the same domain deﬁnitions. Each interacting 0006-291X/$ - see front matter Ó 2009 Elsevier Inc. All rights reserved. doi:10.1016/j.bbrc.2009.09.130 Abbreviations: PPI, protein–protein interaction; DDI, domain–domain interac- tions; AS, association; MLE, maximum likelihood estimation; PE, parsimony explanation; LP, linear programming; PTS, positive testing set; NTS, negative testing set * Corresponding author. E-mail address: xuanhung.ta@helsinki.ﬁ (H.X. Ta). Biochemical and Biophysical Research Communications 390 (2009) 357–362 Contents lists available at ScienceDirect Biochemical and Biophysical Research Communications journal homepage: www.elsevier.com/locate/ybbrc