ARTICLE IN PRESS JID: INS [m3Gsc;July 16, 2018;9:22] Information Sciences 000 (2018) 1–19 Contents lists available at ScienceDirect Information Sciences journal homepage: www.elsevier.com/locate/ins Counting cliques in parallel without a cluster engineering a fork/join algorithm for shared-memory platforms Emilio Coppa, Irene Finocchi, Renan Leon Garcia ∗ Computer Science Department, Sapienza University of Rome, Rome, Italy a r t i c l e i n f o Article history: Received 1 February 2018 Revised 19 June 2018 Accepted 8 July 2018 Available online xxx Keywords: Clique counting Divide-and-conquer parallelism Fork/join Subgraph enumeration a b s t r a c t In this paper we develop simple and fast multicore parallel algorithms for counting the number of k-cliques in large undirected graphs, for any small constant k ≥ 4. Clique count- ing is an important problem in a variety of network analytics applications. Differently from existing solutions, which mainly target distributed memory settings (e.g., MapReduce), our algorithms work on off-the-shelf shared-memory multicore platforms. We assess the effectiveness of our approach through an extensive experimental analysis on a variety of real-world graphs, considering different clique sizes and scalability on dif- ferent numbers of cores. The experimental results show that our parallel algorithms largely outperform the running times of highly optimized sequential solutions and gracefully scale to non-trivial values of k even on medium/large graphs. For instance, computing hundreds of billions of cliques for rather demanding Web graphs and social networks requires about 15 min on a 32-core machine. As a by-product of our experimental analysis, we also com- pute the exact number of k-cliques with at most 20 nodes in many real-world networks from the SNAP repository. © 2018 Published by Elsevier Inc. 1. Introduction The problem of counting – and possibly listing – all the occurrences of a small pattern subgraph in a given graph has a long history. The ﬁrst papers date back to the 70s, but there has been a renewed interest in the last few years in connection with the growth of network analytics applications. In particular, the enumeration of small dense subgraphs has been the subject of many recent works. Modern real-world networks have indeed a large number of nodes and sparse connections, but due to locality of relationships are locally very dense, i.e., contain an enormous number of small dense subgraphs that can be exploited for a variety of tasks such as spam and fraud detection [13], social networks analysis [34], link classiﬁcation and recommendation [45], and the discovery of patterns in biological networks [35]. The focus of this paper is on listing k−cliques (i.e., complete subgraphs of k nodes) in large-scale networks, for any small constant k: k-clique counting and enumeration is an important building block in numerous graph mining algorithms, see, e.g., [15,36,43]. In Table 1 we present the number of k-cliques for k ≤ 7 on a selection of datasets from the SNAP repository 1 , a general purpose, high-performance system for the analysis and the manipulation of large networks: besides a graph mining library, SNAP also provides a collection of more than 50 medium-size and large real-world datasets. As shown in Table 1, graphs with millions of edges can easily have hundreds of billions of k-cliques. This sheer size makes the design ∗ Corresponding author. E-mail addresses: coppa@di.uniroma1.it (E. Coppa), ﬁnocchi@di.uniroma1.it (I. Finocchi), garcia@di.uniroma1.it (R.L. Garcia). 1 Stanford Network Analysis Platform: http://snap.stanford.edu/ https://doi.org/10.1016/j.ins.2018.07.018 0020-0255/© 2018 Published by Elsevier Inc. Please cite this article as: E. Coppa et al., Counting cliques in parallel without a cluster engineering a fork/join algorithm for shared-memory platforms, Information Sciences (2018), https://doi.org/10.1016/j.ins.2018.07.018