GRAPH ENTROPY RATE MINIMIZATION AND THE COMPRESSIBILITY OF
UNDIRECTED BINARY GRAPHS
Marcos E. Bolanos, Selin Aviyente, and Hayder Radha
Department of Electrical and Computer Engineering
Michigan State University
2120 Engineering Building
East Lansing, MI, 48824, USA
ABSTRACT
With the increasing popularity of complex network analysis through
the use of graphs, a method for computing graph entropy has become
important for a better understanding of a network’s structure and for
compressing large complex networks. There have been many differ-
ent definitions of graph entropy in the literature which incorporate
random walks, degree distribution, and node centrality. However,
these definitions are either computationally complex or seemingly
ad hoc. In this paper we propose a new approach for computing
graph entropy with the intention of quantifying the compressibility
of a graph. We demonstrate the effectiveness of our measure by
identifying the lower bound of the entropy rate for scale-free, lattice,
star, random, and real-world networks.
1. INTRODUCTION
The structural basis of various complex systems, including biologi-
cal and social processes, can be modeled using graphs. The underly-
ing structure of networks can have a strong influence over the flow of
information, spread of diseases, and sharing of ideas [1]. This struc-
ture has been characterized through different graph measures such as
the diameter, clustering coefficient, cost, efficiency, and path length.
In information theory [2], entropy is a measure of the uncer-
tainty associated with a random variable. The original definition of
graph entropy was introduced by Korner [3] which quantifies the
lower bound on the complexity of graphs. However, Korner’s defini-
tion of entropy is NP hard which makes its evaluation for real world
networks implausible. Recently, Dehmer et al. proposed to quan-
tify the complexity of a graph using Shannon’s definition of entropy
such that the probability distribution is computed from node degree
[4]. This measure quantifies entropy using the localized features of
a graph’s nodes such as closeness centrality and degree centrality.
These centrality measures, however, do not fully capture the com-
plexity of a graph, i.e. are limited to local neighborhoods, and the
approach appears to be ad hoc due to the arbitrary choice of distri-
bution functionals. Kolmogorov-Sinai entropy rate can be used to
compute entropy by evaluating random walks along the graph [5].
This measure of graph entropy rate was proposed by Burda et. al. [6]
and recently implemented by Sinatra et. al [7] to quantify the maxi-
mum level of information diffusion across a network. This measure,
however, is plagued by a similar problem observed in Dehmer’s mea-
sure; a dependence on node degree which is the weakest measure of
network connectivity [8].
This work was in part supported by the National Science Foundation
under Grants No. CAREER CCF-0746971
Burda’s entropy rate is not suitable for evaluating graph com-
pressibility since a graph cannot be uniquely reconstructed solely
with knowledge of its degree sequence. This motivates the need for a
new entropy measure which may lead to an appropriate and practical
coding algorithm for reconstructing the graph from its compressed
version. In this paper, we propose a new measure of graph entropy
rate for an undirected binary graph by modeling the adjacency ma-
trix as a Markov process. We demonstrate the performance of this
method for evaluating entropy of well-known network models such
as star, lattice, random, scale-free, and modular as well as three real-
world networks. We also compare the estimated entropy rates with
the compression rate of a graph via a well-known coding algorithm,
Lempel-Ziv.
2. BACKGROUND
A graph is defined as G =(V, E ) where V is the set of m vertices and
E is the set of edges assigned to a node pair, v
i
and v
j
[9]. An un-
weighted adjacency matrix A =[A
ij
], where i, j = 1, 2, ..., m, stores
the connectivity information of the graph as a matrix of 0s and 1s
such that A
ij
= 1 if e
ij
∈ E and 0 otherwise, where e
ij
is an edge
between nodes v
i
and v
j
. In this study, simple, i.e. no self-loops or
parallel edges, binary undirected graphs are considered. The degree
of a node, d(i), is the number of immediate neighbors connected to
it, i.e. d(i)= ∑
j
A
ij
.
3. A GRAPH ENTROPY RATE
Entropy rate of a Markov process is formally defined as
H(χ )= -
∑
ij
π
i
P
ij
log
2
P
ij
(1)
where P =[P
ij
] is the probability transition matrix such that i, j =
1, 2, ..., m, P
ij
= Pr(X
r+1
= j|X
r
= i), and π is the stationary distri-
bution. In this paper, we introduce a new approach for computing the
entropy rate of a graph, H(G), by first applying a scanning function
upon the elements of a permuted adjacency matrix. The scanning
function generates a stochastic process X
1
, X
2
, ..., X
r
, for r = 1, 2, ...,
represented by a binary sequence. If X has the property of an n
th
or-
der Markov process, we define the general form of the entropy rate
for a graph as
H(G; n) , min
Z,ψ
H
x
(ZAZ
T
; ψ ) (2)
where Z is a permutation matrix applied to the adjacency matrix,
A, and ψ is the particular scanning function of the upper triangular
2012 IEEE Statistical Signal Processing Workshop (SSP)
978-1-4673-0183-1/12/$31.00 ©2012 IEEE 109