Uncorrected Author Proof Journal of Intelligent & Fuzzy Systems xx (20xx) x–xx DOI:10.3233/JIFS-182765 IOS Press 1 Scaling density-based community detection to large-scale social networks via MapReduce framework 1 2 3 Muhammad Abulaish a , Ishfaq Majid Bhat b and Sajid Yousuf Bhat c, 4 a Department of Computer Science, South Asian University, Delhi, India 5 b Department of Information Technology, Central University of Kashmir, J&K, India 6 c Department of Computer Sciences, University of Kashmir, J&K, India 7 Abstract. Community detection from networks is one of the long standing and challenging tasks in the field of complex network research. Detection of communities poses numerous challenges in terms of their overlapping and hierarchical nature, dynamics of networks and underlying communities, scalability of detection algorithms on large scale networks to mention a few. Traditional community detection methods are not readily scalable to large networks mainly due to the computation of global network metrics. This paper presents a novel scalable overlapping community detection approach for large scale networks by presenting a MapReduce framework based implementation of a density-based local community detection method. The method is divided in two stages where the first stage uses a MapReduce approach to identify a mutual- core connected subgraph of the underlying network. The second stage uses an existing connected component detection method, implemented via MapReduce, to identify connected components in the mutual-core connected subgraph generated in the first stage. A community is then taken as the union of the core-nodes in a connected component and the respective density-based neighborhood of each core-node in the connected component. The resulting approach is among the first scalable overlapping community detection methods proposed in literature. 8 9 10 11 12 13 14 15 16 17 18 19 Keywords: Community detection, overlapping community, connected components, mapreduce, large-scale networks 20 1. Introduction 21 Network systems are ubiquitous in nature and 22 society, and form the basic structure representing 23 interactions among various related entities. Some 24 important network systems include (i) real-world 25 social networks like human proximity networks, 26 friendship networks, terror networks, and crime/gang 27 networks, (ii) biological networks like protein- 28 protein interaction networks and gene regulatory 29 networks (iii) computer and computer-generated 30 Corresponding author. Sajid Yousuf Bhat, Department of Computer Sciences, University of Kashmir, J&K, India. E-mail: bhatsajid@uok.edu.in. networks like Internet and WWW (iv) online social 31 networks like Facebook, Twitter, and LinkedIn 32 (v) financial networks like banking transaction net- 33 works (vi) road networks (vi) power-grid networks, 34 and (vii) telecommunication networks like mobile 35 call graphs. 36 Social Network analysis (SNA) is a multi- 37 disciplinary field dedicated to the analysis and 38 modelling of relations and diffusion processes 39 between various objects of network structures 40 found in nature and society, and other informa- 41 tion/knowledge processing entities. The aim of SNA 42 is to understand how the behaviour and interaction 43 of such entities translate to large-scale social net- 44 work systems. SNA is one of the important techniques 45 ISSN 1064-1246/19/$35.00 © 2019 – IOS Press and the authors. All rights reserved