Link Prediction in Complex Networks Based on Cluster Information Jorge Carlos Valverde-Rebaza and Alneu de Andrade Lopes Departamento de Ciˆ encias de Computa¸ c˜ ao Instituto de Ciˆ encias Matem´ aticas e de Computa¸c˜ ao Universidade de S˜ao Paulo - Campus de S˜ ao Carlos Caixa Postal 668 13560-970 S˜ ao Carlos, SP, Brazil {jvalverr,alneu}@icmc.usp.br Abstract. Cluster in graphs is densely connected group of vertices sparsely connected to other groups. Hence, for prediction of a future link between a pair of vertices, these vertices common neighbors may play dif- ferent roles depending on if they belong or not to the same cluster. Based on that, we propose a new measure (WIC) for link prediction between a pair of vertices considering the sets of their intra-cluster or within-cluster (W) and between-cluster or inter-cluster (IC) common neighbors. Also, we propose a set of measures, referred to as W forms, using only the set given by the within-cluster common neighbors instead of using the set of all common neighbors as usually considered in the basic local similarity measures. Consequently, a previous clustering scheme must be applied on the graph. Using three different clustering algorithms, we compared WIC measure with ten basic local similarity measures and their counter- part W forms on ten real networks. Our analyses suggest that clustering information, no matter the clustering algorithm used, improves link pre- diction accuracy. Keywords: Link Prediction, Complex Networks, Clustering. 1 Introduction Many social, biological, and information systems can be naturally described as networks, where vertices represent entities (individuals or organizations) and links denote relations or interactions between vertices [18], [30]. Networks or graphs are a powerful representation that has been employed in different tasks of machine learning (ML) and data mining (DM). This growing interest in the use of graph can be justified by the expressiveness of this representation and its applications include: supervised learning [16], [4], [19]; unsupervised learning [6], [25], [24], [20]; and semi-supervised learning [5], [3], [12], to cite just a few. An important scientific issue regarding network analysis that has attracted increasing attention in recent years is the link prediction. The link prediction problem aims to estimate the likelihood of the future existence of a link between two disconnected vertices in a network, based on the observed links [13]. L.N. Barros et al. (Eds.): SBIA 2012, LNAI 7589, pp. 92–101, 2012. c Springer-Verlag Berlin Heidelberg 2012