How Gene Survival Depends on Their Length Natalia Polak 1 , Joanna Banaszak 1 , Pawel Mackiewicz 1 , Ma lgorzata Dudkiewicz 1 , Maria Kowalczuk 1 , Dorota Mackiewicz 1 , Kamila Smolarczyk 1 , Aleksandra Nowicka 1 , Miros law R. Dudek 2 , and Stanis law Cebrat 1⋆ 1 Department of Genomics, Institute of Genetics and Microbiology, University of Wroclaw, ul. Przybyszewskiego 63/77, PL-54148 Wroclaw, Poland {malgosia, pamac, nowicka, kowal, dorota, polak, smolar, cebrat}@microb.uni.wroc.pl http://smORFland.microb.uni.wroc.pl 2 Institute of Physics, University of Zielona G´ora, ul. A. Szafrana 4a, PL-65516 Zielona G´ora, Poland mdudek@proton.if.uz.zgora.pl Abstract. Gene survival depends on the mutational pressure acting on the gene sequences and selection pressure for the function of the gene products. While the probability of the occurrence of mutations inside genes depends roughly linearly on their length, the probability of elimi- nation of their function does not grow linearly with the length because of the intragenic suppression effect. Furthermore, the probability of re- definition of the stop and start codons is independent of the gene length while shortening of gene sequences by generating stop codons inside gene sequences depends on gene length. 1 Introduction One of many different mechanisms introducing mutations into genomes are single nucleotide substitutions which happen during DNA replication. There are four different kinds of nucleotides Adenine (A), Thymine (T), Guanine (G), and Cyto- sine (C) and substitution of one of them by any of the three others are random but highly biased. Some nucleotides are more often substituted than others and the substituting nucleotides are also unevenly ”chosen” [1]. Thus, for each of the twelve possible kinds of nucleotide substitutions a specific probability of the event can be experimentally estimated and put into the ”matrix of substitutions” (Tab. 1) [2]. The most stable genes should be built of the most stable nucleotides. On the other hand, the selection for gene function demands rather specific composition of the gene products which restricts not only the nucleotide composition of genes but, which is more important, the proper length of the coding sequence. A sub- stitution inside the coding sequence can exert very different effects on the amino ⋆ To whom all correspondence should be sent. M. Bubak et al. (Eds.): ICCS 2004, LNCS 3039, pp. 694–699, 2004. c Springer-Verlag Berlin Heidelberg 2004