The Max-Min High-Order Dynamic Bayesian Network Learning for Identifying Gene Regulatory Networks from Time-Series Microarray Data Yifeng Li and Alioune Ngom School of Computer Science, University of Windsor, Windsor, Ontario, Canada {li11112c, angom}@uwindsor.ca Abstract—We propose a new high-order dynamic Bayesian network (HO-DBN) learning approach, called Max-Min High- Order DBN (MMHO-DBN), for discrete time-series data. MMHO-DBN explicitly models the time lags between parents and target in an efﬁcient manner. It extends the Max-Min Hill-Climbing Bayesian network (MMHC-BN) technique which was originally devised for learning a BN’s structure from static data. Both Max-Min approaches are hybrid local learn- ing methods which fuse concepts from both constraint-based Bayesian techniques and search-and-score Bayesian methods. The MMHO-DBN ﬁrst uses constraint-based ideas to limit the space of potential structure and then applies search-and-score ideas to search for an optimal HO-DBN structure. We evaluated the ability of our MMHO-DBN approach to identify genetic regulatory networks (GRN’s) from gene expression time-series data. Preliminary results on artiﬁcial and real gene expression time-series are encouraging and show that it is able to learn (long) time-delayed relationships between genes, and faster than current HO-DBN learning methods. Keywords-Gene Regulatory Networks, High-Order Relation- ships, Dynamic Bayesian Network, Max-Min Heuristic I. I NTRODUCTION Accurate and fast reconstruction of genetic regulatory net- works (GRNs) is an important task that has recently become possible due to large-scale high-throughput experiments such as microarray experiments [1]. Gene expression levels obtained over sufﬁciently large number of time-points can be used to identify GRNs. It is well known that the expressions of a given genes can affect how certain genes are expressed, either down-regulated or up-regulated. GRN represents such causal relationships interactions among genes, encoding all the temporal dependencies between genes in an organism [2]. Regulatory events within an organism are asynchronous, that is different genes can regulate other genes at different time-scales and with different delays. Accurate and efﬁcient reconstruction of GRNs from ex- pression time-series data is a computationally hard task, in particular due to fact that expression levels are measured for a large number of genes numbering in the thousands, and over few number of time-points numbering in the tens. GRN identiﬁcation methods based on ordinary differential equations (ODEs) techniques [2] will be prohibitively slow on such amount of data. GRN inference techniques such as Boolean network [2] methods are not causal and are not very robust to noise and uncertainty in the data. GRN reconstruction approaches based on probabilistic graphical model (PGM), such as Bayesian networks and Markov random ﬁelds [2], have become more popular due to their inherent ability to process uncertain data and their robustness to noise; missing data can also be taken care. PGMs are also more efﬁcient for processing large number of genes [3]. Bayesian networks (BNs) are PGMs which compactly represents a joint probability distribution among a set of variables [4]. They are directed acyclic graphs (DAGs) which can appropriately model GRNs, that is they model genes as nodes and causal dependencies between genes as edges [5]. Due to their acyclicity constraint, BNs are unable to model self-regulations, feedback loops, and time-delayed interactions, which are the characteristic of GRNs. Dynamic BNs (DBNs) are proposed to tackle these limitations by unrolling a BN over time [6]. In DBNs, a transition network between any two consecutive time-points characterizes the GRN; that is only genes at time-point t − 1 are supposed to regulate genes at time-points t. This is a ﬁrst-order assump- tion allowing to model temporal causal dependencies among genes. First-order DBNs (FO-DBNs) however, cannot model time-delayed interactions longer than one time step. To this effect, high-order DBNs (HO-DBNs) were introduced by [7] to model longer time-delayed interactions. In this paper, we contribute a new HO-DBN structure learning algorithm, called Max-Min HO-DBN (MMHO- DBN), based on an appropriate extension of the original MMHC-BN algorithm of [4] which was devised to alleviate the limitations of the current BN approaches for learning the structure of BNs from static data. The rest of this paper is organized as follow. Section II presents GRN modeling with HO-DBNs. In Section III, we discuss current methods to learning HO-DBN structures for reconstructing GRNs from microarray time-series data. Then we introduce our MMHO-DBN structure learning method in Section IV. Preliminary results and discussions are presented in Section V. Finally, we conclude and suggest possible direction of research in HO-DBN learning. II. MODELING TIME-DELAYED REGULATIONS WITH HO-DBNS Let us consider a gene expression time-series data set g T ×N =(g 1 ,..., g T ) T summarizing the observations (i.e., 83 978-1-4673-5875-0/13/$31.00 c 2013 IEEE