INFORMATION-THEORETIC BOUNDS OF EVOLUTIONARY PROCESSES MODELED AS A
PROTEIN COMMUNICATION SYSTEM
Liuling Gong, Nidhal Bouaynaya
*
and Dan Schonfeld
University of Illinois at Chicago, Dept. of Electrical and Computer Engineering,
ABSTRACT
In this paper, we investigate the information theoretic bounds
of the channel of evolution introduced in [1]. The channel of
evolution is modeled as the iteration of protein communica-
tion channels over time, where the transmitted messages are
protein sequences and the encoded message is the DNA. We
compute the capacity and the rate-distortion functions of the
protein communication system for the three domains of life:
Achaea, Prokaryotes and Eukaryotes. We analyze the trade-
off between the transmission rate and the distortion in noisy
protein communication channels. As expected, comparison
of the optimal transmission rate with the channel capacity in-
dicates that the biological fidelity does not reach the Shan-
non optimal distortion. However, the relationship between
the channel capacity and rate distortion achieved for differ-
ent biological domains provides tremendous insight into the
dynamics of the evolutionary processes. We rely on these re-
sults to provide a model of protein sequence evolution based
on the two major evolutionary processes: mutations and un-
equal crossover.
Index Terms— Biological communication system; Chan-
nel capacity; Rate-distortion theory.
1. INTRODUCTION
The genetic information storage and transmission apparatus
resembles engineering communication systems in many ways:
The genomic information is digitally encoded in the DNA.
By decoding genes into proteins, organisms come into be-
ing. The protein communication system, proposed in [1],
[2] and shown in Fig. 1, is a communication model of the
genetic information storage and transmission apparatus. The
protein communication system abstracts a cell as a set of pro-
teins and models the process of cell division as an informa-
tion communication system between protein sets. Using this
mathematical model of protein communication, the problem
of a species’ evolution will be represented as the iteration of
a communication channel over time.
The genome is viewed as the joint source-channel en-
coded message of the protein communication system and hence
*
Nidhal Bouaynaya is currently in the Department of Systems Engineer-
ing at the University of Arkansas at Little Rock.
can be investigated in the context of engineering communica-
tion codes. In particular, it is legitimate to ask at what rate
can the genomic information be transmitted. And what is the
average distortion between the transmitted message and the
received message at this rate? Shannon’s channel capacity
theorem states that, by properly encoding the source, a com-
munication system can transmit information at a rate that is
as close to the channel capacity as one desires with an arbi-
trarily small transmission error. Conversely, it is not possi-
ble to reliably transmit at a rate greater than the channel ca-
pacity. The theorem, however, is not constructive and does
not provide any help in designing such codes. In the case
of biological communication systems, however, evolution has
already designed the code for us. The encoded message is
the DNA sequence. Comparison of the genomic transmis-
sion rate with the channel capacity will reveal whether the ge-
nomic code is efficient from an information theoretic perspec-
tive. However, even if the channel capacity is not exceeded,
we are assured that biological communication systems do not
rely on codes that produce negligible errors since the level of
distortion presented must account for evolutionary processes.
It is, therefore, interesting to ask ourselves whether biologi-
cal communication systems maintain an optimal balance be-
tween the transmission rate and the desired distortion level
needed to support adaptive evolution. Rate-distortion theory
analyzes the optimal tradeoff between the transmission rate,
R(D), and distortion, D, in noisy communication channels.
Given the fidelity, D, present in biological communication
systems, comparison of the genomic transmission rate with
the optimal rate R(D) can be used to determine whether or
not the genomic code achieves the optimal rate-distortion cri-
teria. Moreover, by equating the optimal rate R(D) with the
channel capacity, C, we can determine whether the biological
fidelity, D, reaches the Shannon optimum distortion. In this
paper, we will only compare the channel capacity and rate
distortion functions of a single source memoryless protein
communication system, modelling asexual reproduction. The
two-source protein communication system, modelling sexual
reproduction, is more involved mathematically and will not
be addressed here.
1 1-4244-1198-X/07/$25.00 ©2007 IEEE SSP 2007