SHORT COMMUNICATION Frequency and Polymorphism of Simple Sequence Repeatsin a Contiguous685-kb DNA Sequence Containing the Human T-Cell Receptor b-Chain Gene Complex P ATRICK C HARM LEY,* , † ,1 P ATRICK C ONCANNON,* , † ,2 L EROY HOOD,† , ‡ AND L EE R OWEN‡ * Virginia Mason Research Center, 1000 Seneca Street, Seattle, Washington 98101; and the Departments of †Immunology and ‡Molecular Biotechnology, University of Washington School of Medicine, Seattle, Washington 98195 Received March 29, 1995; accepted July 26, 1995 phism analysis. For regions of particular interest, mi- The human T-cell receptor b-chain (TCRB) gene crosatellite analysis based on the complete DNA se- complex spans 575 kb in chromosome region 7q35 and quence of a region allows one to choose among the avail- has been the subject of a large-scale DNA sequencing able SSRs with the ultimate in precision. effort. A contiguous 685-kb DNA sequence from this The complete DNA sequence of the human T-cell re- region was searched by computer analysis for the oc- ceptor b-chain (TCRB) gene complex has recently been currence of simple sequence repeats (microsatellites) determined (L. Rowen et al., manuscript in prepara- with core sequence lengths of 2–5 nucleotides. tion). As of GenBank release R86, this 685-kb contig Twenty-nine such microsatellites of repeat number n represented the longest stretch of DNA yet reported § 9 were found, with the majority being dinucleotide from the human genome. Therefore, this TCRB se- repeats. By PCR analysis, 19 were found to be polymor- quence provides an opportunity to test the potential phic in repeat number, thus averaging one per 36 kb. outcomes of using large stretches of genomic DNA for These polymorphic di-, tri-, and tetranucleotide re- developing microsatellite polymorphisms for the fine peats had between 3 and 15 differently sized alleles genetic mapping of disease genes. each. The potential usefulness of these TCRB microsa- Genotypes were collected using the polymerase chain tellites for detecting disease susceptibility alleles was reaction from 72–75 Centre d’Etude du Polymor- examined by measuring the linkage disequilibrium be- phisme (CEPH) family Caucasian parental DNAs. tween these markers and flanking biallelic mutations. From these genotypes, allele and observed heterozygos- All but 4 microsatellites (79%) demonstrated signifi- cant linkage disequilibrium (P õ 0.0001). This present ity frequencies were calculated. The computer program study highlights the utility and potential outcomes of ASSOC (13) was used to calculate the deviation of the large-scale DNA sequencing for the identification of multiallelic microsatellite genotype frequencies from polymorphic simple sequence repeats. 1995 Academic Hardy – Weinberg expectations. Two-locus linkage dis- Press, Inc. equilibrium was assessed by using haplotypes for which phase could be determined, using the genotypes collected (i.e., from individuals who were homozygous In the past, DNA microsatellites have been identified at both loci or heterozygous at not more than one of and isolated primarily by hybridization screening of the loci). Estimations of the ‘‘overall’’ linkage disequi- cloned DNA using oligonucleotide probes (19). This ap- librium between the microsatellites and certain bial- proach has been highly successful since the stringency lelic polymorphisms were calculated using a x 2 statistic of the hybridization can be adjusted to reveal predomi- to compare the observed haplotype frequencies with nantly longer (e.g., n § 12 for dinucleotide repeats) the haplotype frequencies expected based on random stretches of these short tandem repeats, which are the association at the two loci, with (r 0 1) (c 0 1) degrees most likely to be polymorphic (11, 18). As the technol- of freedom (17). Classes with expected values of õ5 ogy for obtaining large continuous sequences (contigs) were combined to avoid inflated statistical differences. of genomic DNA improves, this information will offer To test separately the level of linkage disequilibrium an alternate approach to choosing SSRs for polymor- for individual microsatellite alleles (‘‘allelic LD’’), each allele was separately compared to all other microsatel- lite alleles combined, using a 2 1 2 x 2 analysis. To Sequence data from this article have been deposited with the EMBL/GenBank Data Libraries under Accession No. L36092. compensate for the large number of pairwise analyses 1 Current address: Darwin Molecular Corp., Bothell, WA 98021. performed, the alpha level of statistical significance 2 To whom correspondence should be addressed at the Virginia was lowered to P õ 0.0001. Mason Research Center, 1000 Seneca Street, Seattle, WA 98101. Fax: (206) 223-7543. E-mail: patcon@u.washington.edu. The 685-kb contig of human TCRB DNA sequence 760 GENOMICS 29, 760–765 (1995) 0888-7543/95 $12.00 Copyright 1995 by Academic Press, Inc. All rights of reproduction in any form reserved.