Genome Analysis On the origin of MADS-domain transcription factors Lydia Gramzow, Markus S. Ritz and Gu ¨ nter Theißen Department of Genetics, Friedrich Schiller University Jena, Philosophenweg 12, D-07743 Jena, Germany MADS-domain transcription factors are involved in sig- nal transduction and developmental control in plants, animals and fungi. Because their diversification is linked to the origin of novelties in multicellular eukaryotes, the early evolution of MADS-domain proteins is of interest, but has remained enigmatic. Employing whole genome sequence information and remote homology detection methods, we demonstrate that the MADS domain origi- nated from a region of topoisomerases IIA subunit A. Furthermore, we provide evidence that gene duplication occurred in the lineage that led to the MRCA of extant eukaryotes, giving rise to SRF-like and MEF2-like MADS- box genes. The MADS domain The MADS domain, named after the proteins MINICHRO- MOSOME MAINTENANCE 1, AGAMOUS, DEFICIENS and SERUM RESPONSE FACTOR (SRF), is the highly conserved DNA-binding domain of a large family of tran- scription factors encoded by MADS-box genes. In flowering plants (angiosperms), which contain a considerable num- ber of these genes (e.g. 107 in Arabidopsis thaliana [1]), MADS-box genes are involved in controlling diverse morphogenetic processes [2]. By contrast, only a few MADS-box genes are present in animals (metazoans; e.g. two in Drosophila melanogaster, and five in human) and fungi (e.g. four in Saccharomyces cerevisiae), where they have important functions in cell proliferation and differ- entiation, and in pheromone response, respectively [3]. According to X-ray crystal structure analyses [46] (and references cited therein), the MADS domain folds into an N-terminal extension, followed by a long amphipathic a- helix, and two antiparallel b-strands. Depending on the definitions used by different authors, the MADS domain is considered to consist of 5560 amino acids [2,710]. On the basis of sequence data and phylogeny reconstructions, two types of MADS domains, which are termed MYOCYTE ENHANCER FACTOR 2 (MEF2)-like and SRF-like, are distinguished [11]. The two domains show differences in DNA binding specificity and the amount of DNA bending they induce [12]. Despite these differences, the MADS-domain sequence is remarkably highly conserved in plants, fungi and metazoans [3]. Because it has never been identified in prokaryotes (bacteria and archaea), and its distribution was never systematically investigated in protists, the evol- ution of the MADS domain in deep time has remained a mystery. Given that the diversification of MADS-domain proteins is intimately linked to the origin of evolutionary novelties in multicellular eukaryotes [13], the origin and early evolution of these transcription factors is of great biological interest. Here, we take advantage of the increas- ing number of whole genome sequences [14] and of progress in bioinformatics tools to elucidate the origin and early diversification of the MADS domain. Early duplication and several losses of the MADS domain in the evolution of eukaryotes To study the distribution of MADS domains in extant organisms, we searched 45 eukaryotic species representing the main eukaryotic lineages (see Methods and Table S1 in the supplementary material online). A total of 57 putative MADS-domain sequences, excluding those from the green plant representative A. thaliana, but including 15 sequences that have not yet been annotated (Figure 1a), could be recovered from 24 different species of the major eukaryotic groups. These genes represent a mixture of orthologs and paralogs. Remarkably, no MADS domain was identified in the parasitic organisms Trichomonas vaginalis and Giardia lamblia, representing excavates. Scanning of the identified MADS-domain sequences against the conserved domains database at the National Centre for Biotechnology Information (NCBI) led to the detection of 32 MEF2-like and 25 SRF-like MADS domains (1e-26 E-value 5e-2, Table S2 in the supplementary material online). Apart from the absence of MADS domains from excavates, the occasional absence of the MEF2-like MADS domain cannot be attributed to certain major groups of eukaryotes, and the absence of the SRF-like MADS domain can only be ascribed to the major eukaryotic group of chromistans, which is not at the base of the eukaryotic tree [15]. Thus, the distribution of the two types of MADS domains suggests that both of them were present early in the evolution of extant eukaryotes. Our findings extend earlier studies [11] that suggested the existence of these two types of MADS domains in the most recent common ancestor (MRCA) of plants, animals and fungi, but did not consider basal protists. Possible acquisitions and losses of SRF-like and MEF2- like MADS domains were reconstructed onto published phylogenetic trees [15]. Because the interpretation of the early evolution of eukaryotes has remained controversial, we used two alternative rooting hypotheses outlined by Baldauf (Figure 2a) [15]. Assuming that it is more likely to lose a MADS domain in one clade during evolution than to gain one in many clades, the likelihood that the SRF-like and the MEF2-like MADS domain, respectively, were pre- sent in the MRCA of extant eukaryotes is well above 0.5 for both types of MADS domains and both rooting hypotheses (Figure 2a; for a more detailed description, see the supple- mentary material online). Assuming that the topology of at Update Corresponding author: Theißen, G. (guenter.theissen@uni-jena.de). 149