International Journal of Modern Physics C, Vol. 12, No. 7 (2001) 1043–1053 c World Scientific Publishing Company MULTIPLE BASE SUBSTITUTION CORRECTIONS IN DNA SEQUENCE EVOLUTION M. KOWALCZUK * , P. MACKIEWICZ * , D. SZCZEPANIK * , A. NOWICKA * , M. DUDKIEWICZ * , M. R. DUDEK † , and S. CEBRAT * * Institute of Microbiology, University of Wroc law, ul. Przybyszewskiego 63/77, 54-148 Wroc law, Poland † Institute of Physics, Pedagogical University of Zielona G´ ora 65-069 Zielona G´ ora, Poland Received 18 June 2001 Revised 25 June 2001 We discuss the Jukes and Cantor’s one-parameter model and Kimura’s two-parameter model unability to describe evolution of asymmetric DNA molecules. The standard dis- tance measure between two DNA sequences, which is the number of substitutions per site, should include the effect of multiple base substitutions separately for each type of the base. Otherwise, the respective tables of substitutions cannot reconstruct the asymmetric DNA molecule with respect to the composition. Basing on Kimura’s neutral theory, we have derived a linear law for the correlation of the mean survival time of nu- cleotides under constant mutation pressure and their fraction in the genome. According to the law, the corrections to Kimura’s theory have been discussed to describe evolution of genomes with asymmetric nucleotide composition. We consider the particular case of the strongly asymmetric Borrelia burgdorferi genome and we discuss in detail the corrections, which should be introduced into the distance measure between two DNA sequences to include multiple base substitutions. Keywords : DNA Evolution; Replication; Tables of Substitutions. 1. Introduction Measuring the evolutionary distance between two DNA sequences requires the knowledge of the substitution rates of the nucleotides. Each DNA sequence is com- posed of four different nucleotides, Adenine (A), Guanine (G), Thymine (T) and Cytosine (C). A specific sequence of these nucleotides determines the information, which is transferred by the DNA molecule. In particular, the information, which is translated for proteins is coded by the genetic code, a specific set of triplets of nucleotides (codons) each of which codes for one amino acid. 1 Although there are 64 possible triplets, the number of amino acids is twenty. This means that the genetic code is degenerated because a given amino acid could be coded by more than one codon. In fact, it is the third nucleotide position in the codon, which is the most degenerated. Therefore, two different organisms for the same function can use the 1043