International Journal of Computer Trends and Technology (IJCTT) – volume 9 number 4– M ar 2014 ISSN: 2231-2803 www.internationaljournalssrg.org Page 164 Link Prediction in Protein-Protein Networks: Survey Manu Kurakar 1 , Sminu Izudheen 2 1 (Department of computer science, Rajagiri School of Science and Technology-Kochi, India) 2 (Department of computer science, Rajagiri School of Science and Technology-Kochi, India) ABSTRACT : Protein networks have a great importance in biological activities. Protein- Protein interaction occurs when two or more proteins interact together to carry out some biological activities. For example signals from the exterior of a cell are mediated to the interior through these interactions. Identification of these interaction have a great significance in understanding complex diseases and also for designing drugs. With the availability of huge biological data, computational biology is at position such that, it can predict missing protein protein interactions. Here, this article summarizes technologies for missing link prediction. Keywords - Link Prediction, Protein Networks, sequence similarity, clustering, interactions I. INTRODUCTION Many real world information can be better represented asnetworks, where nodes represent entities and edges represents the relationship between the entities. The study of complex networks is there for a common interest of various branches of science. Consider the case of biological networks, large amount of data is available about protein protein interactions. These protein protein interactions have great importance in understanding biological activities, analyzing complex disease and also for designing new drugs for diseases. Protein protein interaction occurs when two or more protein bind together to perform certain biological functions. Interactions between proteins are important for biological functions. For example, signals from the exterior of the cells are mediated to the interior by protein protein interaction of the signaling molecules. Identification of these interactions through clinical study includes very complex procedures. Yeast two-hybrid screening and affinity capture mass spectrometry are two important methods for determining these interactions. With the availability of large amount of biological data, various computational models are there to predict missing protein protein interactions. Important challenge with human protein protein interaction is that, our knowledge on these data is very limited. For example, 99.7 per cent of human molecular interactions are still unknown [1]. Blindly checking all possible interaction is very expensive and not possible. So prediction techniques are used to predict missing interaction based on the known interaction. This approach will reduce the cost effectively, provided the prediction technique must be accurate enough. Here, this article summarizes various techniques and algorithms for link prediction on protein protein interactions networks. This article is organized as follows. The article start with an introduction to link prediction then describes the representation of genetic data and its notations. Section III describes important link prediction algorithms developed so far and finally the conclusion and references. II. REPRESENTING GENETIC DATA Protein protein interaction data can be represented by using graph data structures, where nodes represent the proteins and edges represent their interaction. So we can represent the PPI data using an undirected network G(V,E), where V indicates the set of edges and E indicates the set of edges. Protein protein interaction network is created after removing multiple links and self- loops. The universal set U will contain all possible | | ∗(| | ଵ) ଶ links, where |V| represent number of elements in the set V. The set of non- existent links would be U - E. Link prediction is based on the assumption that, among the non- existent links, some links were missing and the task is to predict the links accurately.