Community Detection in Incomplete Information Networks Wangqun Lin National University of Defense Technology Changsha, China linwangqun2005@gmail.com Xiangnan Kong University of Illinois at Chicago Chicago, Illinois xkong4@uic.edu Philip S. Yu University of Illinois at Chicago Chicago, Illinois psyu@uic.edu Quanyuan Wu National University of Defense Technology Changsha, China quanyuanwu@nudt.edu.cn Yan Jia National University of Defense Technology Changsha, China yanjia@nudt.edu.cn Chuan Li Sichuan University Chengdu, China lcharles@scu.edu.cn ABSTRACT With the recent advances in information networks, the prob- lem of community detection has attracted much attention in the last decade. While network community detection has been ubiquitous, the task of collecting complete network data remains challenging in many real-world applications. Usually the collected network is incomplete with most of the edges missing. Commonly, in such networks, all nodes with attributes are available while only the edges within a few local regions of the network can be observed. In this paper, we study the problem of detecting communities in incomplete information networks with missing edges. We first learn a distance metric to reproduce the link-based dis- tance between nodes from the observed edges in the local information regions. We then use the learned distance met- ric to estimate the distance between any pair of nodes in the network. A hierarchical clustering approach is proposed to detect communities within the incomplete information networks. Empirical studies on real-world information net- works demonstrate that our proposed method can effectively detect community structures within incomplete information networks. Categories and Subject Descriptors H.2.8 [Database Management]: Database Application- Data Mining General Terms Algorithms, Experimentation Keywords Community detection, incomplete information networks, dis- tance metric learning 1. INTRODUCTION Information networks arise naturally in a wide range of domains. Examples include biological networks, publica- tion networks and social networks. In these networks, fea- Copyright is held by the International World Wide Web Conference Com- mittee (IW3C2). Distribution of these papers is limited to classroom use, and personal use by others. WWW 2012, April 16–20, 2012, Lyon, France. ACM 978-1-4503-1229-5/12/04. ture vectors are usually available which are associated with nodes. Links represent relationships between the nodes. Identifying communities in information networks is a crucial step to understand the network structures. The community is defined as a group of nodes which are densely connected inside the group, while loosely connected with the nodes outside the group. Community detection in network data has been exten- sively studied in the literature [17, 19, 3, 18, 2, 21, 14]. Con- ventional approaches focus on detecting communities based upon linkage information. They assume that the complete linkage information within the entire network is available. However, in many real-world networks, such as terrorist- attack information networks, the complete linkages are very difficult or even impossible to obtain. Instead, the complete linkage information is only available within a few small lo- cal regions. We notice that a similar problem has also been studied in [13]. However, in this paper, we focus on incom- plete information networks with local information regions. For example, in work relation networks, it is usually impos- sible to obtain the complete linkage information among all the people. But usually we can afford to obtain the work relationships within a small number of local regions, such as groups or organizations. These networks are called incom- plete information networks in this paper. The local regions with complete linkage information are called local informa- tion regions. An incomplete information network with local information regions is shown in the upper left level of Fig- ure 1. Some real-world examples for community detection in incomplete information networks are listed as follows: • Terrorist-attack network. Let us consider a ter- rorist attack activity networks within a period in a certain country. Each node in the network represents a terrorist activity. Terrorist attacks committed by the same terrorist organization are linked with each other. Investigating the community structures within these networks is a challenging problem, since most of the connections/links between attacks are not clearly resolved. Detecting the communities in these incom- plete information networks is crucial for analyzing the structures of terrorist-attack activities. • Food Web The food web of a large ecosystem is usu- ally a highly complex network. Each node in the net- work represents a living organism, while the links rep- WWW 2012 – Session: Community Detection in Social Networks April 16–20, 2012, Lyon, France 341