Exploiting Binary Abstractions in Deciphering Gene Interactions
Sungroh Yoon, Abhishek Garg, Eui-Young Chung, Hyun Seok Park, Woong Yang Park, and
Giovanni De Micheli, Fellow, IEEE
Abstract— We consider computationally reconstructing gene
regulatory networks on top of the binary abstraction of gene
expression state information. Unlike previous Boolean network
approaches, the proposed method does not handle noisy gene
expression values directly. Instead, two-valued “hidden state”
information is derived from gene expression profiles using a
robust statistical technique, and a gene interaction network is
inferred from this hidden state information. In particular, we
exploit Espresso, a well-known 2-level Boolean logic optimizer
in order to determine the core network structure. The resulting
gene interaction networks can be viewed as dynamic Bayesian
networks, which have key advantages over more conventional
Bayesian networks in terms of biological phenomena that
can be represented. The authors tested the proposed method
with a time-course gene expression data set from microarray
experiments on anti-cancer drugs doxorubicin and paclitaxel.
A gene interaction network was produced by our method, and
the identified genes were validated with a public annotation
database. The experimental studies we conducted suggest that
the proposed method inspired by engineering systems can be
a very effective tool to decipher complex gene interactions in
living systems.
I. I NTRODUCTION
Now equipped with well-established methods to sequence
genes in many organisms including humans, we want to un-
derstand the interactions of individual genes as a next step. In
particular, computational reconstruction of gene interaction
networks has received much attention since the invention of
the DNA microarray technology. Despite some controversial
issues involved [1], it remains true that the DNA microarray
technology delivers unprecedented throughput when we want
to monitor the expression of a whole genome simultaneously.
Large-scale gene expression data sets obtained from DNA
microarray experiments provide invaluable information for
gene network inference algorithms that typically require a
large amount of empirical data.
The problem of reverse-engineering a complex system
from its input and output behavior, as is the case in gene
network reconstruction, has already been extensively studied
in electrical engineering. In particular, approaches to model
gene networks using Boolean networks [2], [3] bear great
This work was supported by a grant of Jerry Yang and Akiko Yamazaki.
S. Yoon is with the Computer Systems Laboratory, Stanford University,
Stanford, CA 94305, USA (E-mail: sryoon@stanford.edu)
A. Garg and G. De Micheli are with the Integrated Systems Center, Swiss
Federal Institute of Technology (EPFL), Lausanne, CH-1015, Switzerland
E.-Y. Chung is with the Department of Electrical and Electronic Engi-
neering, Yonsei University, Seoul 120-749, Korea
H. S. Park is with the Department of Computer Science and Engineering,
Ewha Womans University, Seoul 120-750, Korea and Macrogen Corpora-
tion, Seoul 153-023, Korea
W. Y. Park is with the Department of Biochemistry, Seoul National
University Medical School, Seoul 110-744, Korea
similarities to digital circuit synthesis. As will be seen
shortly, the network connectivity information in Boolean
networks can easily be obtained by Espresso, a well-known
2-level Boolean logic minimizer [4], even though this fact
remained unnoticed in previous work on Boolean networks.
Although Boolean network approaches can be computa-
tionally more efficient than alternatives and are therefore
scalable to a larger gene network, the impact of Boolean
methods on the bioinformatics field has been somewhat lim-
ited. A possible reason is that representing gene expression
levels by only two states can be oversimplification of contin-
uous biological signals. Another reason may come from the
deterministic nature of most Boolean approaches. Biological
data can often be noisy, and statistical methods may provide
a more robust solution. To alleviate the second problem,
probabilistic Boolean networks have been proposed [5], [6],
but they still suffer from the first issue – oversimplification
of gene expression profiles.
Given the mature technologies and tools for processing
binary information in engineering, we claim that a binary
abstraction of biological information remains to be a very
appealing technique for inferring gene interaction networks,
assuming that we use binary abstractions in a right context.
Most previous Boolean approaches start with a binary repre-
sentation of gene expression profiles: if a gene expression
level is lower than a threshold, the gene expression is
considered 0 or OFF; otherwise the expression is regarded
as 1 or ON. As previously stated, this bifurcation of gene
expression values tends to be overly simplistic and prone
to error. Just as Boolean logic minimizers are not designed
for noisy electrical signals obtained directly from analog
sensors, gene expression profiles (they are noisy biological
signals from biosensors) are often not suitable for binary
representations and processing.
In this paper, we introduce a new computational method
to build gene interaction networks from time-course gene
expression data. This method distinguishes two types of
states associated with gene expression. The observed state
of a gene is its empirically observed expression value. The
hidden state of a gene represents its biological state that
caused the observed state. The hidden state information is
deduced from the observed state information by a statistical
approach to handle noise in expression values. It is the hidden
state information that is represented and processed in its
binary form. The proposed method analyzes the hidden state
information and finally produces gene interaction networks.
The introduction of hidden states resembles the state defi-
nitions in conventional hidden Markov models (HMMs) [7].
Proceedings of the 28th IEEE
EMBS Annual International Conference
New York City, USA, Aug 30-Sept 3, 2006
SaEP4.8
1-4244-0033-3/06/$20.00 ©2006 IEEE. 5858