ARTICLE Evolution of a Human-Specific Tandem Repeat Associated with ALS Meredith M. Course, 1 Kathryn Gudsnuk, 1 Samuel N. Smukowski, 1 Kosuke Winston, 1 Nitin Desai, 1 Jay P. Ross, 2,3 Arvis Sulovari, 4 Cynthia V. Bourassa, 2,5 Dan Spiegelman, 2,5 Julien Couthouis, 6 Chang-En Yu, 7 Debby W. Tsuang, 7 Suman Jayadev, 1,8 Mark A. Kay, 6,9 Aaron D. Gitler, 6 Nicolas Dupre, 10 Evan E. Eichler, 4,11 Patrick A. Dion, 2,5 Guy A. Rouleau, 2,3,5 and Paul N. Valdmanis 1,4, * Tandem repeats are proposed to contribute to human-specific traits, and more than 40 tandem repeat expansions are known to cause neurological disease. Here, we characterize a human-specific 69 bp variable number tandem repeat (VNTR) in the last intron of WDR7, which exhibits striking variability in both copy number and nucleotide composition, as revealed by long-read sequencing. In addition, greater repeat copy number is significantly enriched in three independent cohorts of individuals with sporadic amyotrophic lateral sclerosis (ALS). Each unit of the repeat forms a stem-loop structure with the potential to produce microRNAs, and the repeat RNA can aggregate when expressed in cells. We leveraged its remarkable sequence variability to align the repeat in 288 samples and uncover its mechanism of expansion. We found that the repeat expands in the 3 0 -5 0 direction, in groups of repeat units divisible by two. The expan- sion patterns we observed were consistent with duplication events, and a replication error called template switching. We also observed that the VNTR is expanded in both Denisovan and Neanderthal genomes but is fixed at one copy or fewer in non-human primates. Eval- uating the repeat in 1000 Genomes Project samples reveals that some repeat segments are solely present or absent in certain geographic populations. The large size of the repeat unit in this VNTR, along with our multiplexed sequencing strategy, provides an unprecedented opportunity to study mechanisms of repeat expansion, and a framework for evaluating the roles of VNTRs in human evolution and disease. Introduction More than 40 tandem repeat expansions in the human genome are known to cause neurological disease. 1–3 This number continues to increase with the growing adoption of long-read sequencing technology, which can sequence longer repeats like variable number tandem repeats (VNTRs; repeats with a repeat unit of seven or more nucle- otides). Until now, most of the tandem repeats associated with disease have been short tandem repeats (STRs; repeats with a repeat unit of six or fewer nucleotides), and the mechanism by which disease-associated repeats expand have been difficult to study, since their repeat tracts are generally uninterrupted, and thus their exact locations of expansion are ambiguous. Long-read sequencing technol- ogy reveals that many VNTRs are far more polymorphic than the reference human genome suggests. Their length and variability provide us with an unprecedented opportu- nity to observe their mechanism of expansion. So far, two VNTRs have been extensively studied in neurological disease: one in ATP binding cassette subfam- ily A member 7 (ABCA7 [MIM: 605414]) associated with Alzheimer disease (MIM: 104300) 4 and one in calcium voltage-gated channel subunit alpha1 C (CACNA1C [MIM: 114205]) associated with schizophrenia (MIM: 181500) and bipolar disorder (MIM: 125480). 5 Both of these VNTRs were studied because they were found in close proximity to a genome-wide association study signal for the associated disease. The high incidence of neurological disease in humans is partly attributed to rapid changes in genes involved in brain function, 6–8 and tandem repeats are proposed to contribute to these human-specific traits. 9 To better understand the role that tandem repeats could play in human-specific brain health and disease, we took a genome-wide approach, looking for VNTRs that expanded only in humans and exhibited far greater variability in a neurological disease population, as compared to the reference genome. One neurodegenerative disease in which repeat expan- sions contribute to a substantial number of cases is amyo- trophic lateral sclerosis (ALS [MIM: 105400]). ALS is a rapidly progressive and uniformly fatal motor neuron dis- ease. Currently, the most common variant found in both familial and sporadic cases is an intronic hexanucleotide tandem repeat expansion in C9orf72-SMCR8 complex subunit (C9orf72 [MIM: 614260]). 10,11 Another repeat expansion in ataxin 2 (ATXN2 [MIM: 601517]) modifies disease in ALS, when repeat copy number is between 27 1 Division of Medical Genetics, University of Washington School of Medicine, Seattle, WA 98195, USA; 2 Montreal Neurological Institute and Hospital, McGill University, Montreal, QC H3A 2B4, Canada; 3 Department of Human Genetics, McGill University, Montreal, QC H3A 0C7, Canada; 4 Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA; 5 Department of Neurology and Neurosurgery, McGill University, Montreal, QC H3A 2B4, Canada; 6 Department of Genetics, Stanford University, Stanford, CA 94305, USA; 7 Geriatric Research, Education, and Clinical Center, VA Puget Sound Health Care System, Seattle, WA 98108, USA; 8 Department of Neurology, University of Washington School of Medicine, Seattle, WA 98195, USA; 9 Department of Pediatrics, Stanford University, Stanford, CA 94305, USA; 10 Neuroscience Axis, CHU de Que ´bec-Universite ´ Laval & Department of Medi- cine, Universite ´ Laval, Quebec City, QC G1J 1Z4, Canada; 11 Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA *Correspondence: paulnv@uw.edu https://doi.org/10.1016/j.ajhg.2020.07.004. The American Journal of Human Genetics 107, 1–16, September 3, 2020 1 Please cite this article in press as: Course et al., Evolution of a Human-Specific Tandem Repeat Associated with ALS, The American Journal of Human Genetics (2020), https://doi.org/10.1016/j.ajhg.2020.07.004 Ó 2020 American Society of Human Genetics.