Random Boolean network models and the yeast
transcriptional network
Stuart Kauffman*, Carsten Peterson
†‡
, Bjo ¨ rn Samuelsson
†
, and Carl Troein
†
*Department of Cell Biology and Physiology, University of New Mexico Health Sciences Center, Albuquerque, NM 87131; and
†
Complex Systems Division,
Department of Theoretical Physics, Lund University, So ¨ lvegatan 14A, S-223 62 Lund, Sweden
Communicated by Philip W. Anderson, Princeton University, Princeton, NJ, October 6, 2003 (received for review June 30, 2003)
The recently measured yeast transcriptional network is analyzed in
terms of simplified Boolean network models, with the aim of
determining feasible rule structures, given the requirement of
stable solutions of the generated Boolean networks. We find that,
for ensembles of generated models, those with canalyzing Boolean
rules are remarkably stable, whereas those with random Boolean
rules are only marginally stable. Furthermore, substantial parts of
the generated networks are frozen, in the sense that they reach the
same state, regardless of initial state. Thus, our ensemble approach
suggests that the yeast network shows highly ordered dynamics.
genetic networks | dynamical systems
T
he regulatory network for Saccharomyces cerevisiae was
recently measured (1) for 106 of the 141 known transcription
factors by determining the bindings of transcription factor
proteins to promoter regions on the DNA. Associating the
promoter regions with genes yields a network of directed gene–
gene interactions. As described in refs. 1 and 2, the significance
of measured bindings with regard to inferring putative interac-
tions are quantified in terms of P values. Lee et al. (1) did not
infer interactions having P values above a threshold value, P
th
=
0.001, for most of their analysis. Small threshold values, P
th
,
correspond to a small number of inferred interactions with high
quality, whereas larger values correspond to more inferred
connections, but of lower quality. It was found that for the P
th
=
0.001 network, the fan-out from each transcription factor to its
regulated targets is substantial, on the average 38 (1). From the
underlying data (http:web.wi.mit.eduyoungregulatory
network), one finds that fairly few signals feed into each of them;
on the average 1.9. The experiments yield the regulatory network
architecture but yield neither the interaction rules at the nodes,
nor the dynamics of the system, nor its final states.
With no direct experimental results on the states of the system,
there is, of course, no systematic method to pin down the
interaction rules, not even within the framework of simplified
and coarse-grained genetic network models; e.g., ones where the
rules are Boolean. One can nevertheless attempt to investigate
to what extent the measured architecture can, based on criteria
of stability, select between classes of Boolean models (3).
We generate ensembles of different model networks on the
given architecture and analyze their behavior with respect to
stability. In a stable system, small initial perturbations should not
grow in time. This aspect is investigated by monitoring how the
Hamming distances between different initial states evolve in a
Derrida plot (4). If small Hamming distances diverge in time, the
system is unstable and vice versa. Based on this criterion, we find
that synchronously updated random Boolean networks (with a
f lat rule distribution) are marginally stable on the transcriptional
network of yeast.
By using a subset of Boolean rules, nested canalyzing functions
(see Methods and Models), the ensemble of networks exhibits
remarkable stability. The notion of nested canalyzing functions
is introduced to provide a natural way of generating canalyzing
rules, which are abundant in biology (5). Furthermore, it turns
out that for these networks, there exists a fair amount of forcing
structures (3), where nonnegligible parts of the networks are
frozen to fixed final states regardless of the initial conditions.
Also, we investigate the consequences of rewiring the network
while retaining the local properties; the number of inputs and
outputs for each node (6).
To accomplish the above, some tools and techniques were
developed and used. To include more interactions besides those
in the P
th
= 0.001 network (1), we investigated how network
properties, local and global, change as P
th
is increased. We found
a transition slightly above P
th
= 0.005, indicating the onset of
noise in the form of biologically irrelevant inferred connections.
In ref. 5, extensive literature studies revealed that, for eu-
karyotes, the rules seem to be canalyzing. We developed a
convenient method to generate a distribution of canalyzing rules,
that fit well with the list of rules presented by Harris et al. (5).
Methods and Models
Choosing Network Architecture. Lee et al. (1) calculated P values
as measures of confidence in the presence of an interaction. With
further elucidation of noise levels, one might increase the
threshold for P values from the value 0.001 used in Lee et al. (1).
To this end, we computed various network properties, to inves-
tigate whether there is any value of P
th
for which these properties
exhibit a transition that can be interpreted as the onset of noise.
In Fig. 1, the number of nodes, mean connectivity, mean pairwise
distance (radius), and fraction of node pairs connected are
shown. As can be seen, there appears to be a transition slightly
above P
th
= 0.005. In what follows, we therefore focus on the
network defined by P
th
= 0.005. Furthermore, we (recursively)
remove genes that have no outputs to other genes, because these
are not relevant for the network dynamics. The resulting network
is shown in Fig. 2.
Generating Rules. Lee et al. (1) determined the architecture of the
network but not the specific rules for the interactions. To
investigate the dynamics on the measured architecture, we
repeatedly assign a random Boolean rule to each node in the
network. We use two rule distributions; one null hypothesis and
one distribution that agrees with rules compiled from the
literature (ref. 5; see also Supporting Text, which is published as
supporting information on the PNAS web site). In both cases, we
ensure that every rule depends on all of its inputs because the
dependence should be consistent with the network architecture.
As a null hypothesis, we use a flat distribution among all
Boolean functions that depend on all inputs. For rules with a few
inputs, this will create rules that can be expressed with normal
Boolean functions in a convenient way. In the case of many
inputs, most rules are unstructured and the result of toggling one
input value will appear random.
In biological systems, the distribution of rules is likely to be
structured. Indeed, all of the rules compiled by Harris et al. (5)
are canalyzing (3); a canalyzing Boolean function (3) has at least
one input, such that for at least one input value, the output value
‡
To whom correspondence should be addressed. E-mail: carsten@thep.lu.se.
© 2003 by The National Academy of Sciences of the USA
14796 –14799 | PNAS | December 9, 2003 | vol. 100 | no. 25 www.pnas.orgcgidoi10.1073pnas.2036429100