A String-based Model for Simple Gene Assembly Robert Brijder 1 , Miika Langille 2 , and Ion Petre 2,3 1 Leiden Institute of Advanced Computer Science, Universiteit Leiden Niels Bohrweg 1, 2333 CA Leiden, The Netherlands rbrijder@liacs.nl 2 Turku Centre for Computer Science Department of IT, ˚ Abo Akademi University Turku 20520 Finland miika.langille@abo.fi ion.petre@abo.fi 3 Academy of Finland Abstract. The simple intramolecular model for gene assembly in cili- ates is particularly interesting because it can predict the correct assembly of all available experimental data, although it is not universal. The sim- ple model also has a confluence property that is not shared by the general model. A previous formalization of the simple model through sorting of signed permutations is unsatisfactory because it effectively ignores one operation of the model and thus, it cannot be used to answer questions about parallelism in the model, or about measures of complexity. We propose in this paper a string-based model in which a gene is repre- sented through its sequence of pointers and markers and its assembly is represented as a string rewriting process. We prove that this string- based model is equivalent to the permutation-based model as far as gene assembly is concerned, while it tracks all operations of the model. 1 Introduction Gene assembly in ciliates has been subject to extensive combinatorial research in recent years, see [2]. Ciliates are unicellular eukaryotes that organize their genome differently in their two types of nuclei. In micronuclei, genes are split into blocks (called MDSs), placed in a shuffled order on the chromosome, sep- arated by non-coding blocks. Moreover, some of the MDSs are even presented in an inverted form. In macronuclei however, genes are contiguous sequences of nucleotides, with all blocks sorted in the orthodox order. The assembly of genes from their micronuclear to their macronuclear form has a definite combinatorial and computational flavor: each MDS M ends with a sequence of nucleotides (called a pointer) that is repeated identically in the beginning of the MDS that should follow M in the macronuclear gene. The exact kinetical mechanisms of gene assembly still remain to be clari- fied through further laboratory experiments. Two models have been proposed for gene assembly: an intermolecular one, see [7, 8] and an intramolecular one, see [3, 10]. The intramolecular model, that we consider in this paper, consists of