The Duplication and Intragenic Domain Expansion of Human C2H2 Zinc Finger Genes Are Associated with Transposable Elements and Relevant to the Expression-based Clustering Wensheng Zhang 1 , Andrea Edwards 1 , Prescott Deininger 2 , Kun Zhang 1 1 Department of Computer Science, Xavier University of Louisiana New Orleans, LA 70125, USA (wzhang, aedwards, kzhang )@xula.edu 2 Tulane Cancer Center, Tulane School of Public Health and Tropical Medicine New Orleans, LA 70112, USA pdeinin@tulane.edu Abstract C2H2 zinc finger (ZNF) protein-coding genes constitute the largest class of transcription factors in humans. ZNF proteins perform regulatory functions by binding DNA to their zinc finger regions. The number of zinc finger repeats varies substantially among ZNF proteins, from 1 to more than 30. About 40% of human C2H2-ZNF genes reside on chromosome 19 (HSA19), the most Alu-enriched chromosome, and demonstrate a clustered organization corresponding to the tandem duplication of the ancestral genes in evolution. In this study, we showed that (1) the duplications of C2H2-ZNF genes preferentially occurred in the genomic regions enriched with retrotransposon Alu (L1) elements and endogenous retroviruses (ERVs); and (2) the within-gene number and density of ZNF motifs are positively correlated (p < 1.36e-9) with the densities of Alu and L1 elements. By analyzing linkage disequilibrium measures for SNP data, we excluded Alu (L1) mediated recombination or gene conversion as a mechanism for the expansion of C2H2-ZNF motifs. Moreover, our study demonstrated that the expression-based grouping of C2H2- ZNF genes is associated with both the genome loci-based clustering and the human-mouse synteny-based binary classification. In particular, we identified a set (n = 24) of low-expressed C2H2-ZNF genes, mainly consisting of singletons without proximal duplications. 1 Introduction Many zinc finger (ZNF) proteins participate involve in the regulation of downstream gene expression, usually by binding DNA to their zinc finger regions. A ZNF region of the proteins coded by C2H2-ZNF genes, which constitute the second largest gene family in mammals, is composed of a basic structural unit (motif), often repeated in tandem, of 28 amino acids. The number of zinc finger repeats varies substantially among ZNF proteins, from one to more than thirty. In mammals, tandem ZNF genes that show a clustered organization on chromosomes have been gained by an ongoing process of lineage-specific duplication and divergence [1-2]. Tandem ZNF genes are dominated by those with a KRAB domain, which is generally thought to be involved. Most KRAB zinc finger (KZNF) proteins in mammals consist of an N-terminal KRAB domain followed by a singleton or an array of ZNF motifs. Throughout vertebrates, tandem ZNF gene expansion is characterized by strong positive selection that has changed the number and DNA binding specificity of zinc fingers but retained a conserved KRAB domain [3]. The total number of genes containing at least three ZNF motifs varies from 108 in chicken to 477 in human [3]. The evolutionary patterns of ZNF genes across linkages, combined with remarkably little functional information, have given rise to a set of long-standing questions such as what are the organismal functions of tandem ZNF genes and why are there so many members? Previous work generally supports the theory that an adaptive evolution mechanism drives the expansion and diversification of zinc-finger domains [3-5]. Recently, [6] showed that the appearance of new families of endogenous retroviruses is strongly predictive of the appearance of new duplicate KZNF genes. Based on this finding and the results from molecular biological experiments, they proposed a more concrete hypothesis in the form of a host-pathogen model, i.e. most vertebrate tandem ZNF genes evolved to repress retroviral or LTR retroelement activity [6]. On chromosome 19 (HSA19), reside ~40% of human C2H2-ZNF genes (n ~ = 700), mostly organized in six large tandem gene clusters [7-9]. HSA19 is also characterized by the enrichment of Alu retrotransposons, the most populous transposable elements (TEs) in primates. The density of Alu elements on HSA19 is nearly 978-1-880843-99-4 / copyright ISCA, BICOB 2015 March 9-11, 2015, Honolulu, Hawaii, USA