Physica D 238 (2009) 1161–1167 Contents lists available at ScienceDirect Physica D journal homepage: www.elsevier.com/locate/physd Modularity density of network community divisions Erik Holmström a,b,∗ , Nicolas Bock b , Johan Brännlund c a Instituto de Física, Universidad Austral de Chile, Valdivia, Chile b Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA c Department of Mathematics & Statistics, Dalhousie University, Halifax, NS B3H 3J5, Canada article info Article history: Received 3 April 2007 Received in revised form 7 March 2008 Accepted 23 March 2009 Available online 5 April 2009 Communicated by A. Doelman Keywords: Modularity Modularity density Network clusters Network communities abstract The problem of dividing a network into communities is extremely complex and grows very rapidly with the number of nodes and edges that are involved. In order to develop good algorithms to identify optimal community divisions it is extremely beneficial to identify properties that are similar for most networks. We introduce the concept of modularity density, the distribution of modularity values as a function of the number of communities, and find strong indications that the general features of this modularity density are quite similar for different networks. The region of high modularity generally has very low probability density and occurs where the number of communities is small. The properties and shape of the modularity density may give valuable information and aid in the search for efficient algorithms to find community divisions with high modularities. © 2009 Elsevier B.V. All rights reserved. 1. Introduction The nodes of a network can be grouped into communities which are loosely defined as groups of nodes that are more ‘‘related’’ to each other in some fashion than they are related to the rest of the network. Such a community division can reveal important structures of the network. In a recent study, for instance, Wilkinson and Huberman [1] introduced a method to create a network of gene co-occurrences from the literature and interpret its communities as groups of genes related to each other by their function. Since some of the genes in these communities are not known to be related to the community’s function, this method possibly aids in identifying unknown relationships of this sort Massen and Doye [2] used a community analysis on a potential energy landscape to identify transition states of small Lennard–Jones clusters. Networks have also been very successfully used to simulate dynamics in various systems. By modeling a community structure of individuals using a contact network model, Meyers et al. [3] predicted the dynamics of a SARS outbreak. It is very difficult to find a good partitioning of a network into communities. In fact, maximizing the modularity is NP-hard [4]. Many different approaches have been used to identify commu- nity structures in networks. To name a few more recent meth- ods: vertex similarity [5], vertex degree gradient [6], resistor ∗ Corresponding author at: Instituto de Física, Universidad Austral de Chile, Valdivia, Chile. Tel.: +56 63225938. E-mail addresses: erikh@lanl.gov, eholmstrom@uach.cl (E. Holmström). network [7], Potts Hamiltonian model [8], and an information– theoretic approach [9]. For some comparative reviews of commu- nity identification methods, see Refs. [10,11]. The most popular methods appear to be ones based on the network modularity Q introduced by Newman and co- workers [12–16]. The advantage with the modularity Q is that it is a well defined number that gives the quality of a particular community division in a network. It is larger for divisions that split the network into groups with many intra-edges and few inter- edges between the groups. A number of different strategies have been proposed for finding the optimal community division based on the modularity. These methods can be broadly divided into two different classes. Path-bound methods are agglomerative or divisive and either successively add or take away edges in the network so as to reduce the number of communities by merging existing communities (agglomerative) or to increase the number of communities by taking away edges and splitting existing communities (divisive). In both cases, the number of possible community divisions depends on the previous steps in the algorithm, or the particular path that was taken in the space of all possible community divisions. The resulting evolution of the community structure is commonly called a dendrogram. The different methods in this class differ in the way they identify the edges to be removed or added. Examples are the shortest-path betweenness [14], random-path betweenness [14], or the greedy algorithm [16,17]. All these methods have in common that they follow a dendrogram and attempt to identify the edges to be removed or added by optimizing the effected modularity change. The number of communities is 0167-2789/$ – see front matter © 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.physd.2009.03.015