INFORMATION-THEORETIC BOUNDS OF EVOLUTIONARY PROCESSES MODELED AS A PROTEIN COMMUNICATION SYSTEM Liuling Gong, Nidhal Bouaynaya * and Dan Schonfeld University of Illinois at Chicago, Dept. of Electrical and Computer Engineering, ABSTRACT In this paper, we investigate the information theoretic bounds of the channel of evolution introduced in [1]. The channel of evolution is modeled as the iteration of protein communica- tion channels over time, where the transmitted messages are protein sequences and the encoded message is the DNA. We compute the capacity and the rate-distortion functions of the protein communication system for the three domains of life: Achaea, Prokaryotes and Eukaryotes. We analyze the trade- off between the transmission rate and the distortion in noisy protein communication channels. As expected, comparison of the optimal transmission rate with the channel capacity in- dicates that the biological delity does not reach the Shan- non optimal distortion. However, the relationship between the channel capacity and rate distortion achieved for differ- ent biological domains provides tremendous insight into the dynamics of the evolutionary processes. We rely on these re- sults to provide a model of protein sequence evolution based on the two major evolutionary processes: mutations and un- equal crossover. Index TermsBiological communication system; Chan- nel capacity; Rate-distortion theory. 1. INTRODUCTION The genetic information storage and transmission apparatus resembles engineering communication systems in many ways: The genomic information is digitally encoded in the DNA. By decoding genes into proteins, organisms come into be- ing. The protein communication system, proposed in [1], [2] and shown in Fig. 1, is a communication model of the genetic information storage and transmission apparatus. The protein communication system abstracts a cell as a set of pro- teins and models the process of cell division as an informa- tion communication system between protein sets. Using this mathematical model of protein communication, the problem of a species’ evolution will be represented as the iteration of a communication channel over time. The genome is viewed as the joint source-channel en- coded message of the protein communication system and hence * Nidhal Bouaynaya is currently in the Department of Systems Engineer- ing at the University of Arkansas at Little Rock. can be investigated in the context of engineering communica- tion codes. In particular, it is legitimate to ask at what rate can the genomic information be transmitted. And what is the average distortion between the transmitted message and the received message at this rate? Shannon’s channel capacity theorem states that, by properly encoding the source, a com- munication system can transmit information at a rate that is as close to the channel capacity as one desires with an arbi- trarily small transmission error. Conversely, it is not possi- ble to reliably transmit at a rate greater than the channel ca- pacity. The theorem, however, is not constructive and does not provide any help in designing such codes. In the case of biological communication systems, however, evolution has already designed the code for us. The encoded message is the DNA sequence. Comparison of the genomic transmis- sion rate with the channel capacity will reveal whether the ge- nomic code is efcient from an information theoretic perspec- tive. However, even if the channel capacity is not exceeded, we are assured that biological communication systems do not rely on codes that produce negligible errors since the level of distortion presented must account for evolutionary processes. It is, therefore, interesting to ask ourselves whether biologi- cal communication systems maintain an optimal balance be- tween the transmission rate and the desired distortion level needed to support adaptive evolution. Rate-distortion theory analyzes the optimal tradeoff between the transmission rate, R(D), and distortion, D, in noisy communication channels. Given the delity, D, present in biological communication systems, comparison of the genomic transmission rate with the optimal rate R(D) can be used to determine whether or not the genomic code achieves the optimal rate-distortion cri- teria. Moreover, by equating the optimal rate R(D) with the channel capacity, C, we can determine whether the biological delity, D, reaches the Shannon optimum distortion. In this paper, we will only compare the channel capacity and rate distortion functions of a single source memoryless protein communication system, modelling asexual reproduction. The two-source protein communication system, modelling sexual reproduction, is more involved mathematically and will not be addressed here. 1 1-4244-1198-X/07/$25.00 ©2007 IEEE SSP 2007