Automatica 46 (2010) 2047–2052 Contents lists available at ScienceDirect Automatica journal homepage: www.elsevier.com/locate/automatica Brief paper Structural properties of continuous representations of Boolean functions for gene network modelling ✩ Saadia Faisal a , Gerwald Lichtenberg b,∗ , Saskia Trump c , Sabine Attinger a a Department of Computational Hydrosystems, Helmholtz Center for Environmental Research, 04318 Leipzig, Germany b Institute of Control Systems, Hamburg University of Technology, 21073 Hamburg, Germany c Department of Environmental Immunology, Helmholtz Center for Environmental Research, 04318 Leipzig, Germany article info Article history: Received 19 June 2009 Received in revised form 1 July 2010 Accepted 9 July 2010 Available online 8 October 2010 Keywords: Genetic networks Boolean networks Forcing functions Canalizing functions Zhegalkin polynomials abstract This paper recaps and extends a new method for the parameter identification of Boolean models with continuous valued data. The proposed Zhegalkin identification method with constraints allows us to include a priori known qualitative properties of the system formulated as binary rules. One rule is especially investigated, i.e. the canalizing property—because of its relevance in gene network modelling from which an application example is given. © 2010 Elsevier Ltd. All rights reserved. 1. Introduction This paper is motivated by an application background resulting from the current problems in gene network modelling, but on the other hand provides quite general results in the field of identification of Boolean discrete time systems. This introduction first motivates the perspective of systems biology to gene dynamics and then gives mathematical generalizations of this problem. Microarray technology – developed more than a decade ago – has allowed experimental biologists to measure various levels of activity of all the genes of a genome quantitatively in a single experiment, (Schena, Shalon, Davis, & Brown, 1995). These gene activities measured at a particular time step are generally referred to as gene expressions and a set of such measurements is called the gene expression data. This data, reflecting the individual levels of activity of various genes simultaneously under various conditions, ✩ The material in this paper was not presented at any conference. This paper was recommended for publication in revised form by Associate Editor Martin Guay under the direction of Editor Frank Allgöwer. ∗ Corresponding author. Tel.: +49 40 42878 3570; fax: +49 40 42878 2112. E-mail addresses: saadia.faisal@ufz.de (S. Faisal), lichtenberg@tu-harburg.de (G. Lichtenberg), saskia.trump@ufz.de (S. Trump), sabine.attinger@ufz.de (S. Attinger). can be analyzed to obtain useful models about genome functions and consequently cell behaviour. As the microarray experiments are quite expensive, only a few measurements compared to the number of possible model parameters needed are available. In order to identify models for many hundreds of genes, the maximum length of available time series is still not more than a few dozen. The situation is in contrast to most engineering applications where usually, the number of measurements is much larger than the number of parameters. Thus, in gene network modelling classical continuous system identification methods are hardly applicable, especially for larger values of the so called connectivity degree, which gives the number of interacting genes for a certain process. A detailed review on gene network modelling can be found e.g. in Bansal, Belcastro, Ambesi-Impiombato, and di Bernardo (2007) and Schlitt and Brazma (2007). Boolean network models of gene dynamics can predict for each gene at each time step whether it is expressed or not, thus the state of each gene is assumed to be either on or off,(Kauffman, 2002). As gene networks share many characteristics with Boolean networks such as periodicity, global complexity, self organization etc., the Boolean idealization is convincing, (Kauffman, 1993; Sniegoski & Somogyi, 1996; Szallasi & Liang, 1998; Zhang, Hayashida, Akutsu, Ching, & Ng, 2007). Although this idealization might seem simpler than modelling the continuous valued dynamics, it is quite complex from the computational standpoint. The main reason for this is the exponential growth of the number 2 (2 n ) of possible 0005-1098/$ – see front matter © 2010 Elsevier Ltd. All rights reserved. doi:10.1016/j.automatica.2010.09.001