C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Differential Conservation Between Interacting and Non- interacting Homologs Identifies Interface Residues Qingzhen Hou 1 , Bas E. Dutilh 2 , Martijn A. Huynen 2 , Jaap Heringa 1 and K. Anton Feenstra 1 1 Center for Integrative Bioinformatics (IBIVU), VU University Amsterdam, The Netherlands 2 Centre for Molecular and Biomolecular Informatics, Radboud Institute for Molecular Life Sciences, Radboud university medical centre, Nijmegen Contact: q.hou@vu.nl, feenstra@few.vu.nl, heringa@few.vu.nl Introduction Many protein families participating in protein-protein interac- tions have several sub-families that bind to different partners or sub-families that do not interact. Specificity in these in- teractions is often decisive to functions of proteins involved in the interaction. The specificity between the interacting versus non-interacting groups might be used for recognising the interaction sites. [1] References [1] Feenstra, KA, G Bastianellio & J Heringa. Predicting Protein Interactions from Functional Specificity. in: From Computational Biophysics to Systems Biology – NIC Series 40 89-92 2008 [2] Pirovano, W, KA Feenstra & J Heringa Sequence comparison by sequence har- mony identifies subtype specific functional sites. Nucl. Acids Res. 34 6540–6548 2006 www.ibi.vu.nl/programs/seqharmwww [3] Ofran. Y. and Rost. B. (2007). Bioinformatics, 23, 13–16. Building interacting and non-interacting subgroups | Investigating conservation difference of interaction position between interacting and non-interacting homologs, requires well aligned sets of inter- acting homodimers and non-interaction monomers. Results A Entropy and Specificity differences | For each alignment, overall entropies and Sequence Harmony (SH) [2] scores were calculated, averaged over interface, surface and buried positions, for the homodimer subgroup and for the whole alignment. ΔEntropy (B) shows differences between surface and interface. The specificity signal appears stronger at longer HSP (High scoring Segment Pair) matches (C). Conclusion and Discussion Our results show that in this dataset, the SH signal is able to distinguish interface and other surface residues. It is possible to predict interaction sites out of all residues using nothing more than sequence and group specificity information, with precisions similar to other methods, but much higher recall. Further improvement of prediction performance could be expected from including additional features of sequences, such as neighbour support information or surface prediction. Performance of predicting interaction sites | We tested the performance of using this specificity signal to predict interaction sites using area under the ROC curve (AUC). Performance increases with longer HSP match length (D, E) and has an optimum between 60-80% identity filtering in our datasets (E). Compared with Ofran & Rost | Venn- diagram showing overlap between our (SH) predicted sites, those predicted by the method developed by Ofran & Rost [3] and the positives as defined from PISA. The overlap seems very small, but the methods appear to be complementary. Together reach amost 50% coverage, while our SH method alone reaches 42% coverage Example of prediction With an SH [2] cut-off <0.2, our method identifies 85 positions (32% of the protein), including 21 binding sites out of all 31 interface sites (68%). This phosphatase subfamily is only active as oligomer, because the interactions are crucial for substrate specificity loop positioning, while other subfamilies may not all require oligomerization. interface predicted interface positions (D) ROC-plots measuring performance of interface prediction at different HSP-length with minimal sequence length 400 (F) Average AUC versus HSP- length and minimal sequence length 400 for datasets with different identity cutoff. (A) Average entropy for interface, surface and buried residues changes with %Identity cut-off. (B) Entropy differences and (C) Specificity signal SH increase between interface and other surface at longer HSP length positives 46472 (50%) SH 124979 Ofran & Rost 23061 2748 (3%) 9497 35869 (39%) 7443 (8%) A B C D E dimer only one monomer 3QGM F1000 Posters: Use mons License. F1000 Posters: Use Permitted un Creative Commons License. F1000 Posters: Use Permitted under Creative C mitted under Creative Commons License. F1000 Posters: Use Permitted under Creative Commons L ers: Use Permitted under Creative Commons License. F1000 Posters: Use Permitted under Creativ Use Permitted under Creative Commons License. F1000 Posters: Use Permitted d under Creative Commons License. F1000 Posters: U tive Commons License. F10