Exploiting Binary Abstractions in Deciphering Gene Interactions Sungroh Yoon, Abhishek Garg, Eui-Young Chung, Hyun Seok Park, Woong Yang Park, and Giovanni De Micheli, Fellow, IEEE Abstract— We consider computationally reconstructing gene regulatory networks on top of the binary abstraction of gene expression state information. Unlike previous Boolean network approaches, the proposed method does not handle noisy gene expression values directly. Instead, two-valued “hidden state” information is derived from gene expression profiles using a robust statistical technique, and a gene interaction network is inferred from this hidden state information. In particular, we exploit Espresso, a well-known 2-level Boolean logic optimizer in order to determine the core network structure. The resulting gene interaction networks can be viewed as dynamic Bayesian networks, which have key advantages over more conventional Bayesian networks in terms of biological phenomena that can be represented. The authors tested the proposed method with a time-course gene expression data set from microarray experiments on anti-cancer drugs doxorubicin and paclitaxel. A gene interaction network was produced by our method, and the identified genes were validated with a public annotation database. The experimental studies we conducted suggest that the proposed method inspired by engineering systems can be a very effective tool to decipher complex gene interactions in living systems. I. I NTRODUCTION Now equipped with well-established methods to sequence genes in many organisms including humans, we want to un- derstand the interactions of individual genes as a next step. In particular, computational reconstruction of gene interaction networks has received much attention since the invention of the DNA microarray technology. Despite some controversial issues involved [1], it remains true that the DNA microarray technology delivers unprecedented throughput when we want to monitor the expression of a whole genome simultaneously. Large-scale gene expression data sets obtained from DNA microarray experiments provide invaluable information for gene network inference algorithms that typically require a large amount of empirical data. The problem of reverse-engineering a complex system from its input and output behavior, as is the case in gene network reconstruction, has already been extensively studied in electrical engineering. In particular, approaches to model gene networks using Boolean networks [2], [3] bear great This work was supported by a grant of Jerry Yang and Akiko Yamazaki. S. Yoon is with the Computer Systems Laboratory, Stanford University, Stanford, CA 94305, USA (E-mail: sryoon@stanford.edu) A. Garg and G. De Micheli are with the Integrated Systems Center, Swiss Federal Institute of Technology (EPFL), Lausanne, CH-1015, Switzerland E.-Y. Chung is with the Department of Electrical and Electronic Engi- neering, Yonsei University, Seoul 120-749, Korea H. S. Park is with the Department of Computer Science and Engineering, Ewha Womans University, Seoul 120-750, Korea and Macrogen Corpora- tion, Seoul 153-023, Korea W. Y. Park is with the Department of Biochemistry, Seoul National University Medical School, Seoul 110-744, Korea similarities to digital circuit synthesis. As will be seen shortly, the network connectivity information in Boolean networks can easily be obtained by Espresso, a well-known 2-level Boolean logic minimizer [4], even though this fact remained unnoticed in previous work on Boolean networks. Although Boolean network approaches can be computa- tionally more efficient than alternatives and are therefore scalable to a larger gene network, the impact of Boolean methods on the bioinformatics field has been somewhat lim- ited. A possible reason is that representing gene expression levels by only two states can be oversimplification of contin- uous biological signals. Another reason may come from the deterministic nature of most Boolean approaches. Biological data can often be noisy, and statistical methods may provide a more robust solution. To alleviate the second problem, probabilistic Boolean networks have been proposed [5], [6], but they still suffer from the first issue – oversimplification of gene expression profiles. Given the mature technologies and tools for processing binary information in engineering, we claim that a binary abstraction of biological information remains to be a very appealing technique for inferring gene interaction networks, assuming that we use binary abstractions in a right context. Most previous Boolean approaches start with a binary repre- sentation of gene expression profiles: if a gene expression level is lower than a threshold, the gene expression is considered 0 or OFF; otherwise the expression is regarded as 1 or ON. As previously stated, this bifurcation of gene expression values tends to be overly simplistic and prone to error. Just as Boolean logic minimizers are not designed for noisy electrical signals obtained directly from analog sensors, gene expression profiles (they are noisy biological signals from biosensors) are often not suitable for binary representations and processing. In this paper, we introduce a new computational method to build gene interaction networks from time-course gene expression data. This method distinguishes two types of states associated with gene expression. The observed state of a gene is its empirically observed expression value. The hidden state of a gene represents its biological state that caused the observed state. The hidden state information is deduced from the observed state information by a statistical approach to handle noise in expression values. It is the hidden state information that is represented and processed in its binary form. The proposed method analyzes the hidden state information and finally produces gene interaction networks. The introduction of hidden states resembles the state defi- nitions in conventional hidden Markov models (HMMs) [7]. Proceedings of the 28th IEEE EMBS Annual International Conference New York City, USA, Aug 30-Sept 3, 2006 SaEP4.8 1-4244-0033-3/06/$20.00 ©2006 IEEE. 5858