In polymorphic genomic regions indels cluster with nucleotide polymorphism: Quantum Genomics Natalie Longman-Jacobsen, Joseph F. Williamson, Roger L. Dawkins * , Silvana Gaudieri Centre for Molecular Immunology and Instrumentation, University of Western Australia, P.O. Box 5100, Canning Vale South, Western Australia 6155, Australia Received 8 October 2002; received in revised form 28 March 2003; accepted 31 March 2003 Received by T. Gojobori Abstract Previously, we have described polymorphic frozen blocks (PFBs) within the Major Histocompatibility Complex (MHC) as regions of several hundred kilobases characterised by high nucleotide diversity, little or no recombination, duplicated segments, disease susceptibility, and human endogenous retroviruses. The nucleotide diversity profile within these PFBs shows peaks and troughs outside of the Class I genes, reflecting other important genes (or sequences) in the region. Here we show that indel density is also clustered with similar peaks and troughs. In fact, SNPs and indels are co-located within PFBs. q 2003 Elsevier Science B.V. All rights reserved. Keywords: Indels; Single nucleotide polymorphisms (SNP)s; Polymorphic blocks; Quantum genomics; Major Histocompatibility Complex 1. Introduction It is accepted that most genomic sequence variation is due to single nucleotide polymorphisms (SNPs), with the rest attributable to indels (insertions and deletions) of one or more bases, repeat length polymorphisms and rearrange- ments (The International SNP Working Group, 2001). Whilst it is assumed that these processes occur indepen- dently, the relationships between SNPs and indels have not been examined. To do so meaningfully, it is essential to be able to compare sequences of extensive haplotypes. Such sequences are available within the Major Histocompatibility Complex (MHC). Previously, we have shown that nucleotide differences are clustered (Gaudieri et al., 2000, 1999) within poly- morphic frozen blocks (PFBs), where there are distinct peaks and troughs of nucleotide diversity. Importantly, we do not find single peaks at the specific HLA loci (O’hUigin et al., 2000; Satta et al., 1998) indicating that polymorphism cannot be explained by simple models of selection. 1.1. Characteristics of polymorphic genomic regions Recent interest in ‘blocks’ within the genome is due to the rediscovery of the phenomena first described more than 10 years ago without the need for arrays and somatic hybrids. We observed that blocks are characterised by peaks of polymorphism, frequent duplication and reduced recom- bination therefore suggesting the term PFBs (reviewed by Dawkins et al., 1999). Now we confirm that indels are also characteristic of PFBs. The use of the term ‘block’ is now quite loose since some such as Patil et al., seem to be referring only to SNP frequency (Patil et al., 2001) which is greatly affected by the sequences selected for comparison and whether duplications and indels are included in the SNP count. Ancestral haplo- types have diverged to different degrees depending on several factors including lineage, time, duplication and indels. We see the need to define all the actual character- istics of blocks and to recognise that nucleotide diversity plots yield clusters of peaks rather than plateaus. We suspect that the best definition might prove to be based on the 0378-1119/03/$ - see front matter q 2003 Elsevier Science B.V. All rights reserved. doi:10.1016/S0378-1119(03)00621-8 Gene 312 (2003) 257–261 www.elsevier.com/locate/gene * Corresponding author. Fax: þ 618-9397-1559. E-mail address: cmii@cyllene.uwa.edu.au (R.L. Dawkins). Abbreviations: DPB, diffuse panbronchiolitis; indel, insertion and deletion; kb, kilobase; Mb, megabase; MHC, Major Histocompatibility Complex; nt, nucleotide; PFB, polymorphic frozen blocks; SNP, single nucleotide polymorphism.