JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 1 DLCD-CCE: A Local Community Detection Algorithm for Complex IoT Networks Xiaolong Xu, Nan Hu, Marcello Trovati, Jeffrey Ray, Francesco Palmieri, and Hari Mohan Pandey Abstract—Internet of Things (IoT) refers to the complex systems generated by the interconnections among widely available objects. Such interactions generate large networks, whose com- plexity needs to be addressed to provide suitable computationally efficient approaches. In this article, we propose a distributed local community detection algorithm based on specific properties of community centre expansions (DLCD-CCE) for large-scale complex networks. The algorithm is evaluated via a prototype system, based on Spark, to verify its accuracy and scalability. The results demonstrate that compared to the typical local community detection algorithms, DLCD-CCE has better accuracy, stability and scalability, and effectively overcomes the problem that existing algorithms are sensitive to the location of initial seeds. Index Terms—Complex Networks, Network Dynamics, Com- munity Detection, IoT I. I NTRODUCTION M ANY real-world systems associated with IoT systems, can be successfully modelled as complex networks [1], where community detection can provide an insight into their topological properties. To achieve this, nodes are grouped into different communities according to the network topology, which are densely connected. On the other hand, as discussed in [2], the connections among different communities are sparse. Depending on the context, community structures are likely to have different connotations. For example, communities in social networks represent groups of people with similar characteristics [3], whereas in biological networks they reveal biological tissue with similar functions, and communities in the Web documents contain a large number of topic-related documents. In this article, we propose DLCD-CCE, a novel distributed local community detection algorithm based on community centre expansion. Our motivation for this research is its wide applicability, as well as providing tangible benefits to a much wider scientific and analytic community. The main contributions of this work include the following: • An efficient method for measuring node centricity, which first calculates the nodes density, and subsequently com- putes the weighted average of density and its neigh- Xiaolong Xu and Nan Hu are with Jiangsu Key Laboratory of Big Data Security Intelligent Processing, Nanjing University of Posts and Telecommu- nications, Nanjing, China Marcello Trovati, Jeffrey Ray and Hari Mohan Pandey are with the Department of Computer Science, Edge Hill University, Ormskirk, UK. Francesco Palmieri is with the Department of Computer Science, University of Salerno, Italy. Copyright (c) 2012 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending a request to pubs-permissions@ieee.org. bours as community centricity. The larger the community centrality of a node, the more important it is in the community which it belongs to. • The design and implementation of distributed and paral- lelised algorithm (DLCD-CCE), based on Spark GraphX. This allows the algorithm to be easily implementable on the current main-stream big data processing platform with good scalability. The rest of the article is organised as follows: Sections II and III discuss the relevant existing techniques and approaches. Section IV introduces the local community detection algorithm proposed in this article and its parallelisation, and Section V details the experimental results and corresponding analysis. Finally, Section VI concludes the paper by summarising the main contributions and points out the future research direction. II. RELATED WORK Since the emergence of Network Theory, several community detection algorithms have been introduced. However, DLCD- CCE has specific proerties and features, which make it particularly suited for this type of tasks, as well as providing a more efficient approach. In this section, relevant existing technologies and methods are discussed. Niu et al. [4] have proposed a new type of multi-objective approach based on label propagation algorithm (LDMGA) for community detection in dynamic networks. Based on the multi-objective genetic algorithm, the evolutionary clustering algorithm is transformed into a multi-objective optimisation problem, which not only improves the clustering quality, but also minimises the clustering drift from one time step to the successive one. LDMGA is effective in clustering, but the search speed of the genetic algorithm is slow. To obtain better clustering results, it requires multiple iterative calculations, so LDMGA is not suitable for distributed computing. In [5], the authors propose PLPIRV (Parallel Label Propagation and Incremental Related Vertices), where label propagation progress is integrated with incremental related vertices properties. Based on the communities found in the previous interval, this algorithm adjusts the communities to which the vertices belong to incrementally, and gradually analyses the changes of the network, in order to avoid the clustering of the whole network. In [6], a new game-theoretic approach towards community detection in large-scale complex networks is introduced, which is based on modified modularity. This method was developed from a modified adjacency and modified Laplacian matrices, as well as neighbourhood similarity, which can classify a given