Uncorrected Author Proof
Journal of Intelligent & Fuzzy Systems xx (20xx) x–xx
DOI:10.3233/JIFS-182765
IOS Press
1
Scaling density-based community detection
to large-scale social networks via
MapReduce framework
1
2
3
Muhammad Abulaish
a
, Ishfaq Majid Bhat
b
and Sajid Yousuf Bhat
c,∗
4
a
Department of Computer Science, South Asian University, Delhi, India 5
b
Department of Information Technology, Central University of Kashmir, J&K, India 6
c
Department of Computer Sciences, University of Kashmir, J&K, India 7
Abstract. Community detection from networks is one of the long standing and challenging tasks in the field of complex
network research. Detection of communities poses numerous challenges in terms of their overlapping and hierarchical
nature, dynamics of networks and underlying communities, scalability of detection algorithms on large scale networks
to mention a few. Traditional community detection methods are not readily scalable to large networks mainly due to the
computation of global network metrics. This paper presents a novel scalable overlapping community detection approach
for large scale networks by presenting a MapReduce framework based implementation of a density-based local community
detection method. The method is divided in two stages where the first stage uses a MapReduce approach to identify a mutual-
core connected subgraph of the underlying network. The second stage uses an existing connected component detection
method, implemented via MapReduce, to identify connected components in the mutual-core connected subgraph generated
in the first stage. A community is then taken as the union of the core-nodes in a connected component and the respective
density-based neighborhood of each core-node in the connected component. The resulting approach is among the first scalable
overlapping community detection methods proposed in literature.
8
9
10
11
12
13
14
15
16
17
18
19
Keywords: Community detection, overlapping community, connected components, mapreduce, large-scale networks 20
1. Introduction 21
Network systems are ubiquitous in nature and 22
society, and form the basic structure representing 23
interactions among various related entities. Some 24
important network systems include (i) real-world 25
social networks like human proximity networks, 26
friendship networks, terror networks, and crime/gang 27
networks, (ii) biological networks like protein- 28
protein interaction networks and gene regulatory 29
networks (iii) computer and computer-generated 30
∗
Corresponding author. Sajid Yousuf Bhat, Department of
Computer Sciences, University of Kashmir, J&K, India. E-mail:
bhatsajid@uok.edu.in.
networks like Internet and WWW (iv) online social 31
networks like Facebook, Twitter, and LinkedIn 32
(v) financial networks like banking transaction net- 33
works (vi) road networks (vi) power-grid networks, 34
and (vii) telecommunication networks like mobile 35
call graphs. 36
Social Network analysis (SNA) is a multi- 37
disciplinary field dedicated to the analysis and 38
modelling of relations and diffusion processes 39
between various objects of network structures 40
found in nature and society, and other informa- 41
tion/knowledge processing entities. The aim of SNA 42
is to understand how the behaviour and interaction 43
of such entities translate to large-scale social net- 44
work systems. SNA is one of the important techniques 45
ISSN 1064-1246/19/$35.00 © 2019 – IOS Press and the authors. All rights reserved