A Comparison of Community Identication Algorithms for Regulatory
Network Motifs
Douglas Oliveira and Marco Carvalho
Abstract— In the recent years high throughput data about
biological processes has become available and thus opened a
wide range of possibilities of research in multi-disciplinary
areas, like network science. An idea that has been widely
accepted is the fact that no life can exist without complex
systems formed by interacting macromolecules. Rather than
a single gene being responsible for a single phenotype (central
dogma), it has been shown that the interaction between several
genes is responsible for a given phenotype, a concept called
System Biology. Identifying patterns of interactions (motifs)
in these complex networks has attracted the attention in the
scientific community, given that these networks are often very
dense and dynamic. In this work we focus on a particular kind
of biological network, a regulatory network where each node
is a transcription factor and two nodes are connected if one
of them encodes a transcription factor to another one that is
regulated by this transcription factor. We focus on a specific
kind of motif, a dense overlapping region (DOR) that claims
that a set of genes regulated by different transcription factors
are more overlapping than expected at a random network. We
use different community identification algorithms in order to
identify which algorithm best suits to the task of identification
of this particular motif.
I. I NTRODUCTION
According to [1] most of the interesting accomplishments
achieved in biological research has been in genomics. One
example is the genome sequencing of many species, includ-
ing the human genome, which has created many possibilities
for a better understanding of the function of many genes from
large-scale sequencing processes. We currently have a good
understanding of life at the molecular level, and recognize
that we need to see gene structures not only in isolation but
also as sets, and how they interact with one another [2].
By accepting the concept of system biology, we are not
denying the importance of reductionist approaches. Reduc-
tionist approaches are just limited concerning the function
of presenting a comprehensive picture of life [1]. One fact
that supports the idea of system biology is that individual
cells when separated from their neighbors lose many of their
functional and structural attributes [3].
The notion of systems biology dates back from hundreds
of years ago when the word organism was initially used to
describe living animals and plants as organizations, where
each part is reciprocally end and means. Many advantages
have rise with this new approach like, for example, evolution-
ary mechanisms can be better understood in light of complex
molecular systems [4].
D. Oliveira and M. Carvalho are with Florida In-
stitute of Technology, 150 W. University Blvd, Mel-
bourne, FL, USA doliveira2011@my.fit.edu,
mcarvalho@cs.fit.edu
With the current availability of terabytes of data in many
domains, including biological processes, communications,
and social interactions, a variety of research actives have
started to focus on modeling and identification of global
network properties and characteristics. These include the
small world property [5] and scale-free networks [6]. One
of the first networks structures analyzed with this approach
was the network representing scientific collaborations and
co-publications [7]. While important, such global metrics
must be augmented with the understanding of basic structural
elements, the building blocks of the network. These building
blocks are often referred to as network motifs [8] and
represent recurring structures and patterns of connections.
In [8] the authors present several different kinds of motifs
normally found in different types of networks. In their work,
the authors justify the presence of the motifs to the way
in which the network was designed. More specifically in
biological networks the work of [9] identifies three major
patterns that are significantly present in the network. Among
them, a motif called dense overlapping regulons (DOR),
requires special attention. The motif is defined as a layer
of overlapping interactions that is much more dense than
the corresponding structures in randomized networks. The
result is a structure characterized by loosely connected and
internally dense regions of interactions. These regions are
often called communities.
There are many community identification algorithms in
literature. In general, such algorithms rely on the partition
of the data into a certain number of communities (groups,
subsets or categories) [10]. There is no clear definition of a
community, but most authors characterize a community by
its internal homogeneity and the external separation [11].
In this work we evaluate the results of four community
identification algorithms aiming to identify which bets suits
for the identification of DOR motifs in a regulatory network.
II. RELATED WORK
Gene expression data is obtained through microarray ex-
periments [16] and is commonly used for study of biological
networks. Community identification algorithms have been
widely applied in these kinds of datasets, for example for
the construction of coexpression networks [12]. In a coex-
pression network each node represents a gene, and two nodes
are connected if their expression levels are similar [13].
The work of [14] shows results of clustering 118 genes
using a hierarchical community identification algorithm in
which members of the same clusters tend to participate
in common processes. In a later work [15], the authors
978-1-4799-3163-7/13/$31.00 ©2013 IEEE