(IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 13, No. 5, 2022 OvSbChain: An Enhanced Snowball Chain Approach for Detecting Overlapping Communities in Social Graphs Jayati Gulati, Muhammad Abulaish Department of Computer Science South Asian University New Delhi, India Sajid Yousuf Bhat Department of Computer Sciences University of Kashmir J&K, India Abstract—Overlapping Snowball Chain is an extension to Snowball Chain, which is based on the concept of community formation in line to the snowball chaining process. The inspiration behind this approach is from the snowball sampling process, wherein a snowball grows to form chain of nodes, leading to the formation of mutually exclusive communities in Snowball Chain. In the current work, the nodes are allowed to be shared among different snowball chains in a graph, leading to the formation of overlapping communities. Unlike its predecessor Snowball Chain, the proposed technique does not require the use of any hyper-parameter which is often difficult to tune for most of the existing methods. The proposed algorithm works in two phases, where overlapping chains are formed in the first phase, and then they are combined using a similarity-based criteria in the second phase. The communities identified at the end of the second phase are evaluated using different measures, including modularity, overlapping NMI and running time over both real-world and synthetic benchmark datasets. The proposed Overlapping Snowball Chain method is also compared with eleven state-of-the-art community detection methods. Keywords—Clustering coefficient; community detection; over- lapping communities; snowball sampling; social graph I. I NTRODUCTION In recent years, there has been a tremendous growth in the study of linked data in the form of networks, such as Internet, World Wide Web, and social networks. The relation- ships among the entities existing in these networks provide rich insights pertaining to various dynamic interactions and might prove to be beneficial in various applications [1]. To analyse and study these networks, graph is used as a data structure, which consists of a set of nodes joined by links or edges that can be labelled/unlabelled, directed/undirected, or signed/unsigned. The representation of an online social network is termed as social graph, which provides a good visualization and eases the interpretation of the network. One of the emerging research areas in social network analysis is community detection, which digs deep into the social graph and mines the most dense subgraphs that are highly cohesive in nature. A community in a network is represented by a set of nodes with high density links among themselves, but low-density links among inter-community con- nections [2]. These subgraphs are called communities or mod- ules. Community detection in a social graph mainly involves splitting it into its constituent functional groups. The task has largely been addressed in a distinct community context wherein the communities are considered to be mutually exclusive. However, in case of real-world networks, community struc- tures can be overlapping wherein a node belongs to multiple communities. A density-based approach called CMiner in [3], aims to find similarity among nodes and defines a distance function. Overlapping communities are identified based on this distance function. Another work in [4], detects overlapping communities along with their evolution, called as OCTracker. A similar work in [5], identifies hierarchical communities called HOCTracker which works for dynamic social networks. The work in this paper aims to address this issue by proposing a novel overlapping community detection algorithm which extends the existing SbChain algorithm. The proposed method, named OvSbChain, starts with identification of the seed or core nodes in a social graph based on a node parameter, called normalized degree. The nodes in the entire social graph are ranked on this parameter and processed in a non-increasing order of their ranks. The method works in two phases. In the first phase, every node is paired with its best suited neighbor in accordance to a score value in each iteration. After several iterations, chains of nodes are formed that may share nodes with each other, i.e., there could be overlapping nodes among different chains. Therefore, the proposed technique is called overlapping snowball chains. The second phase tries to combine chains based on a similarity criteria as discussed in Section III, which finally leads to the formation of overlapping communities. Therefore, the technique focuses on resolving the problem in hand, i.e., community detection using an uncom- plicated and elementary strategy. The major enhancements in this work can be summarized as follows: 1) OvSbChain introduces overlapping communities un- like SbChain, which produces only crisp communi- ties. 2) There is no hyper-parameter tuning required in OvS- bChain, hence, it always produces the same set of communities every time it is run. 3) SbChain uses a maximum common neighbor criteria for finding its best neighbor. Whereas, OvSbChain uses normalized degree function to find its best neighbor. Also, both the techniques differ in the way they find the seed nodes. This is discussed in detail in Section III. 4) The results are evaluated and compared based on www.ijacsa.thesai.org 1010 | Page