GRAPH ENTROPY RATE MINIMIZATION AND THE COMPRESSIBILITY OF UNDIRECTED BINARY GRAPHS Marcos E. Bolanos, Selin Aviyente, and Hayder Radha Department of Electrical and Computer Engineering Michigan State University 2120 Engineering Building East Lansing, MI, 48824, USA ABSTRACT With the increasing popularity of complex network analysis through the use of graphs, a method for computing graph entropy has become important for a better understanding of a network’s structure and for compressing large complex networks. There have been many differ- ent definitions of graph entropy in the literature which incorporate random walks, degree distribution, and node centrality. However, these definitions are either computationally complex or seemingly ad hoc. In this paper we propose a new approach for computing graph entropy with the intention of quantifying the compressibility of a graph. We demonstrate the effectiveness of our measure by identifying the lower bound of the entropy rate for scale-free, lattice, star, random, and real-world networks. 1. INTRODUCTION The structural basis of various complex systems, including biologi- cal and social processes, can be modeled using graphs. The underly- ing structure of networks can have a strong influence over the flow of information, spread of diseases, and sharing of ideas [1]. This struc- ture has been characterized through different graph measures such as the diameter, clustering coefficient, cost, efficiency, and path length. In information theory [2], entropy is a measure of the uncer- tainty associated with a random variable. The original definition of graph entropy was introduced by Korner [3] which quantifies the lower bound on the complexity of graphs. However, Korner’s defini- tion of entropy is NP hard which makes its evaluation for real world networks implausible. Recently, Dehmer et al. proposed to quan- tify the complexity of a graph using Shannon’s definition of entropy such that the probability distribution is computed from node degree [4]. This measure quantifies entropy using the localized features of a graph’s nodes such as closeness centrality and degree centrality. These centrality measures, however, do not fully capture the com- plexity of a graph, i.e. are limited to local neighborhoods, and the approach appears to be ad hoc due to the arbitrary choice of distri- bution functionals. Kolmogorov-Sinai entropy rate can be used to compute entropy by evaluating random walks along the graph [5]. This measure of graph entropy rate was proposed by Burda et. al. [6] and recently implemented by Sinatra et. al [7] to quantify the maxi- mum level of information diffusion across a network. This measure, however, is plagued by a similar problem observed in Dehmer’s mea- sure; a dependence on node degree which is the weakest measure of network connectivity [8]. This work was in part supported by the National Science Foundation under Grants No. CAREER CCF-0746971 Burda’s entropy rate is not suitable for evaluating graph com- pressibility since a graph cannot be uniquely reconstructed solely with knowledge of its degree sequence. This motivates the need for a new entropy measure which may lead to an appropriate and practical coding algorithm for reconstructing the graph from its compressed version. In this paper, we propose a new measure of graph entropy rate for an undirected binary graph by modeling the adjacency ma- trix as a Markov process. We demonstrate the performance of this method for evaluating entropy of well-known network models such as star, lattice, random, scale-free, and modular as well as three real- world networks. We also compare the estimated entropy rates with the compression rate of a graph via a well-known coding algorithm, Lempel-Ziv. 2. BACKGROUND A graph is defined as G =(V, E ) where V is the set of m vertices and E is the set of edges assigned to a node pair, v i and v j [9]. An un- weighted adjacency matrix A =[A ij ], where i, j = 1, 2, ..., m, stores the connectivity information of the graph as a matrix of 0s and 1s such that A ij = 1 if e ij E and 0 otherwise, where e ij is an edge between nodes v i and v j . In this study, simple, i.e. no self-loops or parallel edges, binary undirected graphs are considered. The degree of a node, d(i), is the number of immediate neighbors connected to it, i.e. d(i)= j A ij . 3. A GRAPH ENTROPY RATE Entropy rate of a Markov process is formally defined as H(χ )= - ij π i P ij log 2 P ij (1) where P =[P ij ] is the probability transition matrix such that i, j = 1, 2, ..., m, P ij = Pr(X r+1 = j|X r = i), and π is the stationary distri- bution. In this paper, we introduce a new approach for computing the entropy rate of a graph, H(G), by first applying a scanning function upon the elements of a permuted adjacency matrix. The scanning function generates a stochastic process X 1 , X 2 , ..., X r , for r = 1, 2, ..., represented by a binary sequence. If X has the property of an n th or- der Markov process, we define the general form of the entropy rate for a graph as H(G; n) , min Z,ψ H x (ZAZ T ; ψ ) (2) where Z is a permutation matrix applied to the adjacency matrix, A, and ψ is the particular scanning function of the upper triangular 2012 IEEE Statistical Signal Processing Workshop (SSP) 978-1-4673-0183-1/12/$31.00 ©2012 IEEE 109