ARTICLE IN PRESS
JID: INS [m3Gsc;July 16, 2018;9:22]
Information Sciences 000 (2018) 1–19
Contents lists available at ScienceDirect
Information Sciences
journal homepage: www.elsevier.com/locate/ins
Counting cliques in parallel without a cluster engineering a
fork/join algorithm for shared-memory platforms
Emilio Coppa, Irene Finocchi, Renan Leon Garcia
∗
Computer Science Department, Sapienza University of Rome, Rome, Italy
a r t i c l e i n f o
Article history:
Received 1 February 2018
Revised 19 June 2018
Accepted 8 July 2018
Available online xxx
Keywords:
Clique counting
Divide-and-conquer parallelism
Fork/join
Subgraph enumeration
a b s t r a c t
In this paper we develop simple and fast multicore parallel algorithms for counting the
number of k-cliques in large undirected graphs, for any small constant k ≥ 4. Clique count-
ing is an important problem in a variety of network analytics applications. Differently from
existing solutions, which mainly target distributed memory settings (e.g., MapReduce), our
algorithms work on off-the-shelf shared-memory multicore platforms.
We assess the effectiveness of our approach through an extensive experimental analysis
on a variety of real-world graphs, considering different clique sizes and scalability on dif-
ferent numbers of cores. The experimental results show that our parallel algorithms largely
outperform the running times of highly optimized sequential solutions and gracefully scale
to non-trivial values of k even on medium/large graphs. For instance, computing hundreds
of billions of cliques for rather demanding Web graphs and social networks requires about
15 min on a 32-core machine. As a by-product of our experimental analysis, we also com-
pute the exact number of k-cliques with at most 20 nodes in many real-world networks
from the SNAP repository.
© 2018 Published by Elsevier Inc.
1. Introduction
The problem of counting – and possibly listing – all the occurrences of a small pattern subgraph in a given graph has a
long history. The first papers date back to the 70s, but there has been a renewed interest in the last few years in connection
with the growth of network analytics applications. In particular, the enumeration of small dense subgraphs has been the
subject of many recent works. Modern real-world networks have indeed a large number of nodes and sparse connections,
but due to locality of relationships are locally very dense, i.e., contain an enormous number of small dense subgraphs that
can be exploited for a variety of tasks such as spam and fraud detection [13], social networks analysis [34], link classification
and recommendation [45], and the discovery of patterns in biological networks [35].
The focus of this paper is on listing k−cliques (i.e., complete subgraphs of k nodes) in large-scale networks, for any
small constant k: k-clique counting and enumeration is an important building block in numerous graph mining algorithms,
see, e.g., [15,36,43]. In Table 1 we present the number of k-cliques for k ≤ 7 on a selection of datasets from the SNAP
repository
1
, a general purpose, high-performance system for the analysis and the manipulation of large networks: besides a
graph mining library, SNAP also provides a collection of more than 50 medium-size and large real-world datasets. As shown
in Table 1, graphs with millions of edges can easily have hundreds of billions of k-cliques. This sheer size makes the design
∗
Corresponding author.
E-mail addresses: coppa@di.uniroma1.it (E. Coppa), finocchi@di.uniroma1.it (I. Finocchi), garcia@di.uniroma1.it (R.L. Garcia).
1
Stanford Network Analysis Platform: http://snap.stanford.edu/
https://doi.org/10.1016/j.ins.2018.07.018
0020-0255/© 2018 Published by Elsevier Inc.
Please cite this article as: E. Coppa et al., Counting cliques in parallel without a cluster engineering a fork/join algorithm
for shared-memory platforms, Information Sciences (2018), https://doi.org/10.1016/j.ins.2018.07.018