Models of Genome Evolution Yi Zhou 1 and Bud Mishra 2, 3 1 Biology Department, New York University, 100 Washington Square East, New York, NY 10003, USA, joey@cs.nyu.edu 2 Courant Institute of Mathematical Sciences, New York University, 251 Mercer Street, New York, NY 10012, USA, mishra@cs.nyu.edu 3 Watson School of Biological Sciences, Cold Spring Harbor Laboratory, 1 Bungtown Rd., Cold Spring Harbor, NY 11724, USA Summary. The evolutionary theory, “evolution by duplication”, originally pro- posed by Susumu Ohno in 1970, can now be verified with the available genome sequences. Recently, several mathematical models have been proposed to explain the topol- ogy of protein interaction networks that have also implemented the idea of “evolution by duplication”. The power law distribution with its “hubby” topology (e.g., P53 was shown to interact with an unusually large number of other proteins) can be explained if one makes the following assumption: new proteins, which are dupli- cates of older proteins, have a propensity to interact only with the same proteins as their evolutionary predecessors. Since protein interaction networks, as well as other higher-level cellular processes, are encoded in genomic sequences, the evolutionary structure, topology and statistics of many biological objects (pathways, phylogeny, symbiotic relations, etc.) are rooted in the evolution dynamics of the genome se- quences. Susumu Ohno’s hypothesis can be tested ‘in silico ’ using Polya’s Urn model. In our model, each basic DNA sequence change is modeled using several probability distribution functions. The functions can decide the insertion/deletion positions of the DNA fragments, the copy numbers of the inserted fragments, and the sequences of the inserted/deleted pieces. Moreover, those functions can be interdependent. A mathematically tractable model can be created with a directed graph representa- tion. Such graphs are Eulerian and each possible Eulerian path encodes a genome. Every “genome duplication” event evolves these Eulerian graphs, and the proba- bility distributions and their dynamics themselves give rise to many intriguing and elegant mathematical problems. In this paper, we explore and survey these connections between biology, mathe- matics and computer science in order to reveal simple, and yet deep models of life itself.