Inference of Genetic Regulatory Networks by Evolutionary Algorithm and H Filter Lijun Qian and Haixin Wang Department of Electrical and Computer Engineering Prairie View A&M University Prairie View, Texas 77446 Email: LiQian, HWang@pvamu.edu Abstract— The correct inference of genetic regulatory networks plays a critical role in understanding biological regulation in phenotypic determination and it can affect advanced genome- based therapeutics. In this study, we propose a joint evolutionary algorithm and Hfiltering approach to infer genetic regulatory networks using noisy time series data from microarray mea- surements. Specifically, an iterative algorithm is proposed where genetic programming is applied to identify the structure of the model and Hfiltering is used to estimate the parameters in each iteration. The proposed method can obtain accurate dynamic nonlinear ordinary differential equation (ODE) model of genetic regulatory networks even when the noise statistics is unknown. Both synthetic data and experimental data from microarray measurements are used to demonstrate the effectiveness of the proposed method. With the increasing availability of time series microarray data, the algorithm developed in this paper could be applied to construct models to characterize cancer evolution and serve as the basis for developing new regulatory therapies. I. I NTRODUCTION A genetic regulatory network (GRN) is a collection of DNA segments in a cell which interact with each other and with other substances in the cell, thereby governing the gene transcriptions. The correct inference of genetic regulatory networks plays a critical role in understanding biological reg- ulation in phenotypic determination and it can affect advanced genome-based therapeutics. In light of the recent development of high-throughput DNA microarray technology, it becomes possible to discover GRNs, which are complex and nonlinear in nature. Specifically, the increasing existence of microar- ray time-series data makes possible the characterization of dynamic nonlinear regulatory interactions among genes. The modeling, analysis and control of GRNs are critical for study- ing cancer evolution and may serve as the basis for developing new regulatory therapies. Because GRN models are difficult to deduce solely by means of experimental techniques, computational and math- ematical methods are indispensable. Much research has been done on GRN modeling by linear differential/difference equa- tions using time-series data, for example, [5], [1], [4], [3], [2], [6], [7], [8], just to name a few. The basic idea is to approximate the combined effects of different genes by means of a weighted sum of their expression levels. In [5], a connec- tionist model is used to model small gene networks operating in the blastoderm of Drosophila. In [1], the concentrations of mRNA and protein are modeled by linear differential equa- tions. A simple form of linear additive functions is suggested by [2], where dx i /dt = n j=1 w ij x j . The degradation rate of gene i’s mRNA and environmental effects are assumed to be incorporated in the parameters w ij and their influence on gene i’s expression level x i is assumed to be linear. A method to obtain a continuous linear differential equation model from sampled time-series data is proposed in [7]. For added biological realism (all concentrations get saturated at some point in time), a sigmoid (squashing) function may be included into the equation. It has been show that this sort of quasi-linear model can be solved by first applying the inverse of the squashing function [3]. In our study, a GRN is modeled by continuous nonlinear Ordinary Differential Equations (ODEs). Compared to linear models, identification of the nonlinear differential equation model is computationally more intensive and can require more data; however, the range of nonlinear behaviors ex- hibited by GRNs can be more thoroughly understood with nonlinear differential equations. In addition, well established dynamical systems theory is available to characterize the dynamics produced by these models. When more time-series data become available owing to advances in microarray or other technologies, and assuming continued improvement in computational capability, it can be expected that continuous nonlinear dynamic models will play a critical role in revealing complicated gene behavior. In general, modeling gene regulatory networks is a nonlinear identification problem. Assuming there are N genes of interest and x i denotes the state (such as the microarray reading) of the i th gene, then the dynamics of the GRN may be modeled as dx i dt = f i (x 1 ,x 2 , ··· ,x N )+ ν i i =1, 2, ··· ,N. (1) where the nonlinear functions f i need to be determined from time-series microarray measurements. In this study, we assume the functions (f i , i) are in the form of polynomials. f i = Li j=1 [(w ij + µ ij ij (x 1 ,x 2 , ··· ,x N )] i =1, 2, ··· ,N. (2) where L i is the number of terms in f i , w ij are the parameters to be estimated and Ω ij (x 1 ,x 2 , ··· ,x N ) is the j th component