Microarray Analysis through Transcription Kinetic Modeling and Information Theory Abdallah Sayyed-Ahmad, Kagan Tuncay and Peter Ortoleva* Center for Cell and Virus Theory, Department of Chemistry, Indiana University, Bloomington IN 47405, USA ABSTRACT cDNA microarray and other multiplex data hold promise for addressing the challenges of cellular complexity, disease progression and drug discovery. We believe that combining transcription kinetic modeling with microarray time series data through information theory will yield more information about the gene regulatory networks than obtained previously. A novel analysis of gene regulatory networks is presented based on the integration of microarray data and cell modeling through information theory. Given a partial network and time series data, a probability density is constructed that is a functional of the time course of intra-nuclear transcription factor (TF) thermodynamic activities, and is a function of RNA degradation and transcription rate and equilibrium constants for TF/gene binding. The most probable TF time courses and the values of aforementioned parameters are computed. Accuracy and robustness of the method are evaluated and an application to Escherichia Coli is demonstrated. A kinetic (and not a steady state) formulation allows us to analyze phenomena with a strongly dynamical character (e.g. the cell cycle, metabolic oscillations, viral infection or response to changes in the extra-cellular medium). 1 INTRODUCTION cDNA (Schena et al., 1995; DeRisi et al., 1997; Sauter et al., 2003) and other multiplex data acquisition techniques yield a great volume of information and thereby hold promise for addressing the challenges of cellular complexity, disease progression and drug discovery (Brown and Botstein, 1999; Debouck and Goodfellow, 1999; Gerhold et al., 1999; Chitler, 2004). Recently we proposed an approach to the analysis of such data based on its integration with cell modeling through information theory (Sayyed-Ahmad et al., 2003). Here we show how this approach can be extended to analyze microarray time series data. The approach has the potential to yield more information on the network of cellular processes from this data than existing methods. Cell models hold great promise for understanding the complex networks of processes underlying cell behavior (Slepchenko et al., 2003; Weitzke and Ortoleva, 2003; Navid and Ortoleva, 2004). Unfortunately they suffer from a lack of information about many of the rate and equilibrium constants for reaction and transport processes (Mendes and Kell, 1998; Sayyed-Ahmad et al., 2003). Furthermore, key aspects of the biochemical network have yet to be resolved i.e. we are presented with the challenge of calibrating and using an incomplete model. In contrast, microarray, protein mass * To whom correspondence should be addressed. E-mail:ccvt@indiana.edu, Telephone: 812-855-2717, Fax: 812-855-8300