GENOMICS 42, 55–66 (1997) ARTICLE NO. GE974708 Nucleotide Sequence Analysis of the HLA Class I Region Spanning the 237-kb Segment around the HLA-B and -C Genes Nobuhisa Mizuki,* , ² Hitoshi Ando,‡ Minoru Kimura,* Shigeaki Ohno,² Shoji Miyata,* Masaaki Yamazaki,§ Hiroyuki Tashiro,§ Koji Watanabe,§ Ayako Ono,§ Susumu Taguchi,§ Chiyo Sugawara,§ Yasuhito Fukuzumi,§ Katsuzumi Okumura, Ø Kaori Goto,* , ² Mami Ishihara,² Satoshi Nakamura,² Junichi Yonemoto,² Yara Yukie Kikuti,* Takashi Shiina,* Lei Chen,* Asako Ando,* Toshimichi Ikemura, and Hidetoshi Inoko * ,1 * Department of Genetic Information, Division of Molecular Life Science, Tokai University School of Medicine, Bohseidai, Isehara, Kanagawa 259-11, Japan; ² Department of Ophthalmology, Yokohama City University School of Medicine, 3-9 Fukuura, Kanazawa-ku, Yokohama, Kanagawa 236, Japan; ‡Kanagawa Shonan Red Cross Blood Center, 1837 Aiko, Atsugi, Kanagawa 243, Japan; §Bioscience Research Laboratory, Fujiya Co., Ltd., 228 Soya, Hadano, Kanagawa 257, Japan; Ø Bioscience Laboratory, Mie University School of Bioresource, 1515 Kamihama-cho, Tsu, Mie 514, Japan; and Department of Evolutional Genetics, National Institute of Genetics, 1111 Yata, Mishima, Shizuoka 411, Japan Received November 25, 1996; accepted March 6, 1997 responsible for antigen presentation to T cells. The To elucidate the detailed gene organization of the HLA gene complex is located on the short arm of chro- human leukocyte antigen (HLA) class I region on chro- mosome 6 within 6p21.3 and covers an about 4-Mb mosome 6, seven contiguous cosmid genomic clones (4000 kb) segment that seems to be generated through covering the 237-kb segment around the HLA-B and -C repeated gene duplication and conversion during evolu- loci were subjected to DNA sequencing by the shotgun tion. The HLA region is conventionally divided into strategy to give a single contig of 236,822 bp from the three areas, class I (2 Mb), class III (1 Mb), and class MICA gene (58.2 kb centromeric of HLA-B) to 90.8 kb II (1 Mb), from telomere to centromere. The HLA class telomeric of HLA-C. This region was confirmed to con- I and class II antigens involved in the genetic control of tain four known genes, MICA, HLA-17, HLA-B, and the immune response are encoded by the corresponding HLA-C, from centromere to telomere. Further, a new regions (Campbell and Trowsdale, 1993). There are member of the P5 multicopy genes was found to be currently 19 HLA or HLA-like expressed genes, more about 1.3 kb upstream of the HLA-17 gene and desig- than 80 non-HLA expressed genes, and 25 pseudogenes nated P5-8. Five novel genes designated NOB1 – 5 were identified by RT-PCR and Northern blot hybridization. or gene fragments localized within the HLA region. On In addition, two pseudogenes, dihydrofolate reductase average, there is 1 gene detected at least every 30–40 pseudogene (DHFRP) and ribosomal protein L3 homol- kb in the HLA region. It is hard to predict whether the ogous gene (RPL3-Hom), were also found in the vicin- gene density observed in the HLA region is remarkably ity of the HLA-B and -C genes, respectively. The two high since other regions of the human genome have not segments (about 40 kb) downstream of the HLA-B and yet been characterized in sufficient detail. It is notable HLA-C genes showed high sequence homology to each that many non-HLA genes involved or not involved in other, suggesting that segmental genome duplication the immune response are located in the HLA region, including the major histocompatibility complex although the functions of most of these genes still re- (MHC) class I gene must have occurred during the evo- main uncertain. lution of the MHC. 1997 Academic Press There are fewer genes so far identified in the class I region than in the class II or class III regions, especially INTRODUCTION in the gene-dense region of approximately 700 kb be- The human major histocompatibility complex (MHC) tween the INT3 and BAT1 genes, where more than 40 encodes highly polymorphic leukocyte antigens (HLA) expressed genes have been identified (Campbell and Trowsdale, 1993). This is mainly because the class I Sequence data reported in this paper have been deposited with the DDBJ, EMBL and GenBank nucleotide sequence Data Libraries region has not been studied so extensively, but it seems under Accession Nos. D83543, D83769, D83770, D83771, D83956, likely that the number of genes within the class I region D83957, and D84394. will continue to increase as more sophisticated means 1 To whom correspondence should be addressed. Telephone: 81- of detecting coding sequences become available. Fur- 463-93-1121 (ext. 2312). Fax: 81-463-94-8884. E-mail: hinoko@ is.icc.u-tokai.ac.jp. thermore, many diseases such as Behc ¸et disease, anky- 55 0888-7543/97 $25.00 Copyright 1997 by Academic Press All rights of reproduction in any form reserved.