Isochore chromosome maps of the human genome Jose ´ L. Oliver a, * , Pedro Carpena b , Ramo ´n Roma ´n-Rolda ´n c , Trinidad Mata-Balaguer a , Andre ´s Mejı ´as-Romero a , Michael Hackenberg a , Pedro Bernaola-Galva ´n b a Departamento de Gene ´tica, Instituto de Biotecnologı ´a, Universidad de Granada, Granada, Spain b Departamento de Fı ´sica Aplicada II, Universidad de Ma ´laga, Ma ´laga, Spain c Departamento de Fı ´sica Aplicada, Universidad de Granada, Ma ´laga, Spain Received 21 December 2001; received in revised form 19 August 2002; accepted 18 September 2002 Abstract The human genome is a mosaic of isochores, which are long DNA segments ( q 300 kbp) relatively homogeneous in G þ C. Human isochores were first identified by density-gradient ultracentrifugation of bulk DNA, and differ in important features, e.g. genes are found predominantly in the GC-richest isochores. Here, we use a reliable segmentation method to partition the longest contigs in the human genome draft sequence into long homogeneous genome regions (LHGRs), thereby revealing the isochore structure of the human genome. The advantages of the isochore maps presented here are: (1) sequence heterogeneities at different scales are shown in the same plot; (2) pair-wise compositional differences between adjacent regions are all statistically significant; (3) isochore boundaries are accurately defined to single base pair resolution; and (4) both gradual and abrupt isochore boundaries are simultaneously revealed. Taking advantage of the wide sample of genome sequence analyzed, we investigate the correspondence between LHGRs and true human isochores revealed through DNA centrifugation. LHGRs show many of the typical isochore features, mainly size distribution, G þ C range, and proportions of the isochore classes. The relative density of genes, Alu and long interspersed nuclear element repeats and the different types of single nucleotide polymorphisms on LHGRs also coincide with expectations in true isochores. Potential applications of isochore maps range from the improvement of gene-finding algorithms to the prediction of linkage disequilibrium levels in association studies between marker genes and complex traits. The coordinates for the LHGRs identified in all the contigs longer than 2 Mb in the human genome sequence are available at the online resource on isochore mapping: http://bioinfo2.ugr.es/isochores. q 2002 Elsevier Science B.V. All rights reserved. Keywords: Isochore maps; Compositional segmentation; Chromosome domains; Comparative genomics; Alus; Long interspersed nuclear elements; Single nucleotide polymorphisms 1. Introduction The availability of the human genome draft sequence offers an unprecedented opportunity to bring sequence patterns into line with the chromosome structures revealed by modern molecular cytogenetics, such as chromosome domains or high-resolution chromosome bands. Isochores – long DNA segments ( q 300 kbp) fairly homogeneous in G þ C, revealed by analytical ultracentrifugation of bulk DNA (Macaya et al., 1976; Bernardi et al., 1985; Bernardi, 1995, 2000) – may be the structures linking both organization levels. In fact, isochores have been success- fully related to chromosome bands (Saccone et al., 1993). One conventional way to visualize sequence heterogen- eity is the moving-window approach. This simple technique consists of sliding a window of arbitrary length along the sequence, and then computing the GC content of each window. This procedure dates from the earliest times of sequence analysis when only short, and often homogeneous, sequences were available. However, with the discovery that eukaryotic genomes are multi-scale complex systems made up of fairly homogeneous isochores of different composition (Macaya et al., 1976; Bernardi et al., 1985; Bernardi, 2000) and with the subsequent finding of long-range correlations in eukaryotic DNA sequences (Li and Kaneko, 1992; Peng et al., 1992; Voss, 1992; Bernaola-Galva ´n et al., 2002a), this 0141-933/02/$ - see front matter q 2002 Elsevier Science B.V. All rights reserved. PII: S0378-1119(02)01034-X Gene 300 (2002) 117–127 www.elsevier.com/locate/gene * Corresponding author. Departamento de Genetica, Facultad de Ciencias, Universidad de Granada, E-18071 Granada, Spain. Fax: þ 34- 958-244073. E-mail address: oliver@ugr.es (J.L. Oliver). Abbreviations: LHGR, long homogeneous genome region; bp, base pair; kbp, kilobase pair; G þ C, guanine plus cytosine content; SNP, single nucleotide polymorphism; MY, millions of years; SINE, short interspersed nuclear element; LINE, long interspersed nuclear element.