Random Boolean network models and the yeast transcriptional network Stuart Kauffman*, Carsten Peterson †‡ , Bjo ¨ rn Samuelsson † , and Carl Troein † *Department of Cell Biology and Physiology, University of New Mexico Health Sciences Center, Albuquerque, NM 87131; and † Complex Systems Division, Department of Theoretical Physics, Lund University, So ¨ lvegatan 14A, S-223 62 Lund, Sweden Communicated by Philip W. Anderson, Princeton University, Princeton, NJ, October 6, 2003 (received for review June 30, 2003) The recently measured yeast transcriptional network is analyzed in terms of simpliﬁed Boolean network models, with the aim of determining feasible rule structures, given the requirement of stable solutions of the generated Boolean networks. We ﬁnd that, for ensembles of generated models, those with canalyzing Boolean rules are remarkably stable, whereas those with random Boolean rules are only marginally stable. Furthermore, substantial parts of the generated networks are frozen, in the sense that they reach the same state, regardless of initial state. Thus, our ensemble approach suggests that the yeast network shows highly ordered dynamics. genetic networks | dynamical systems T he regulatory network for Saccharomyces cerevisiae was recently measured (1) for 106 of the 141 known transcription factors by determining the bindings of transcription factor proteins to promoter regions on the DNA. Associating the promoter regions with genes yields a network of directed gene– gene interactions. As described in refs. 1 and 2, the significance of measured bindings with regard to inferring putative interac- tions are quantified in terms of P values. Lee et al. (1) did not infer interactions having P values above a threshold value, P th = 0.001, for most of their analysis. Small threshold values, P th , correspond to a small number of inferred interactions with high quality, whereas larger values correspond to more inferred connections, but of lower quality. It was found that for the P th = 0.001 network, the fan-out from each transcription factor to its regulated targets is substantial, on the average 38 (1). From the underlying data (http:web.wi.mit.eduyoungregulatory network), one finds that fairly few signals feed into each of them; on the average 1.9. The experiments yield the regulatory network architecture but yield neither the interaction rules at the nodes, nor the dynamics of the system, nor its final states. With no direct experimental results on the states of the system, there is, of course, no systematic method to pin down the interaction rules, not even within the framework of simplified and coarse-grained genetic network models; e.g., ones where the rules are Boolean. One can nevertheless attempt to investigate to what extent the measured architecture can, based on criteria of stability, select between classes of Boolean models (3). We generate ensembles of different model networks on the given architecture and analyze their behavior with respect to stability. In a stable system, small initial perturbations should not grow in time. This aspect is investigated by monitoring how the Hamming distances between different initial states evolve in a Derrida plot (4). If small Hamming distances diverge in time, the system is unstable and vice versa. Based on this criterion, we find that synchronously updated random Boolean networks (with a f lat rule distribution) are marginally stable on the transcriptional network of yeast. By using a subset of Boolean rules, nested canalyzing functions (see Methods and Models), the ensemble of networks exhibits remarkable stability. The notion of nested canalyzing functions is introduced to provide a natural way of generating canalyzing rules, which are abundant in biology (5). Furthermore, it turns out that for these networks, there exists a fair amount of forcing structures (3), where nonnegligible parts of the networks are frozen to fixed final states regardless of the initial conditions. Also, we investigate the consequences of rewiring the network while retaining the local properties; the number of inputs and outputs for each node (6). To accomplish the above, some tools and techniques were developed and used. To include more interactions besides those in the P th = 0.001 network (1), we investigated how network properties, local and global, change as P th is increased. We found a transition slightly above P th = 0.005, indicating the onset of noise in the form of biologically irrelevant inferred connections. In ref. 5, extensive literature studies revealed that, for eu- karyotes, the rules seem to be canalyzing. We developed a convenient method to generate a distribution of canalyzing rules, that fit well with the list of rules presented by Harris et al. (5). Methods and Models Choosing Network Architecture. Lee et al. (1) calculated P values as measures of confidence in the presence of an interaction. With further elucidation of noise levels, one might increase the threshold for P values from the value 0.001 used in Lee et al. (1). To this end, we computed various network properties, to inves- tigate whether there is any value of P th for which these properties exhibit a transition that can be interpreted as the onset of noise. In Fig. 1, the number of nodes, mean connectivity, mean pairwise distance (radius), and fraction of node pairs connected are shown. As can be seen, there appears to be a transition slightly above P th = 0.005. In what follows, we therefore focus on the network defined by P th = 0.005. Furthermore, we (recursively) remove genes that have no outputs to other genes, because these are not relevant for the network dynamics. The resulting network is shown in Fig. 2. Generating Rules. Lee et al. (1) determined the architecture of the network but not the specific rules for the interactions. To investigate the dynamics on the measured architecture, we repeatedly assign a random Boolean rule to each node in the network. We use two rule distributions; one null hypothesis and one distribution that agrees with rules compiled from the literature (ref. 5; see also Supporting Text, which is published as supporting information on the PNAS web site). In both cases, we ensure that every rule depends on all of its inputs because the dependence should be consistent with the network architecture. As a null hypothesis, we use a flat distribution among all Boolean functions that depend on all inputs. For rules with a few inputs, this will create rules that can be expressed with normal Boolean functions in a convenient way. In the case of many inputs, most rules are unstructured and the result of toggling one input value will appear random. In biological systems, the distribution of rules is likely to be structured. Indeed, all of the rules compiled by Harris et al. (5) are canalyzing (3); a canalyzing Boolean function (3) has at least one input, such that for at least one input value, the output value ‡ To whom correspondence should be addressed. E-mail: carsten@thep.lu.se. © 2003 by The National Academy of Sciences of the USA 14796 –14799 | PNAS | December 9, 2003 | vol. 100 | no. 25 www.pnas.orgcgidoi10.1073pnas.2036429100