Cloning, Characterization, and the Complete 56.8-Kilobase DNA Sequence of the Human NOTCH4 Gene Linheng Li,* , ,1 Guyang M. Huang, ,1,2 Amy B. Banta,* , Yu Deng, , Todd Smith,§ Penny Dong,Cynthia Friedman,Lei Chen,Barbara J. Trask, Thomas Spies, Lee Rowen,and Leroy Hood * , ,3 * Stowers Institute for Medical Research; Department of Molecular Biotechnology, University of Washington, Seattle, Washington 98195; Institute of Genetics, Fudan University, Shanghai, 200433 People’s Republic of China; §Geospiza, Inc., Bioinformatics Consulting and Contracting, 2442 NW Market Street, Suite 344, Seattle, Washington 98107; and Fred Hutchinson Cancer Research Center, Clinical Research Division, Seattle, Washington 98104 Received January 9, 1998; accepted April 6, 1998 The first complete mammalian genomic sequence re- ported thus far in the Notch gene family, including a putative promoter region and 30 exons of the human NOTCH4 gene spanning 56.8 kb of DNA, were se- quenced. The NOTCH4 locus contains a TATA-less pro- moter with two putative transcription initiation sites (Inr), three RBP-Jsites, and two GATA recognition sites. Two cDNA isoforms, NOTCH4(L) and NOTCH4(S), were identified. Whereas the NOTCH4(S) isoform con- tains the entire coding sequence, the NOTCH4(L) iso- form has two unspliced intronic sequences between exons 11 and 12 and exons 20 and 21 and a misspliced exon 6. Consistent with these results, two alterna- tively spliced isoforms of transcripts of approximately 9.3 and 6.7 kb were detected by Northern blot analysis. The predicted amino acid sequence of the NOTCH4 protein based on the NOTCH4(S) cDNA sequence con- tains 2003 amino acids and includes the predominant motifs of the Notch family: 29 epidermal growth factor (EGF)-like repeats, 3 Notch/lin-12 repeats, a trans- membrane region, 6 cdc10/Ankyrin repeats, and a PEST domain. © 1998 Academic Press INTRODUCTION The human major histocompatibility complex (MHC or HLA) locus is located on the short arm of chromo- some 6 and spans approximately 3.5— 4.0 Mb (Milner and Campbell, 1992). This locus includes three distinct regions: MHC classes I, II, and III. The class I and II regions encode highly polymorphic MHC proteins that are involved in antigen presentation during immune responses as well as genes encoding a variety of other products (Beck et al., 1992; Campbell and Trowsdale, 1993; Spies et al., 1989). The class III region, which is known to be extremely gene-dense, spans about 1.1 Mb and is flanked by the class I HLA-B and class II HLA- DRA loci (Milner and Campbell, 1992). To elucidate the structural organization of the class III region, we have undertaken its genomic sequence analysis. A novel Notch gene near the centromeric end of the class III locus was identified during the course of this sequence analysis. Sugaya and co-workers reported the identifi- cation of a Notch gene at this locus, but did not com- plete the analysis (Sugaya et al., 1994). Notch was first identified by its role in regulating the segregation of neuroblasts from ectodermal cells in Drosophila (Artavanis-Tsakonas and Simpson, 1991; Artavanis-Tsakonas et al., 1995; Fortini et al., 1993a; Kimble and Simpson, 1997). Notch is also essential for eye, wing, bristle, egg chamber, and mesoderm devel- opment (Cagan and Ready, 1989; Fehon et al., 1991; Fortini et al., 1993b; Hartenstein and Posakany, 1990; Heitzler and Simpson, 1991; Palka et al., 1990; Ruo- hola et al., 1991; Xu et al., 1992). Significant progress has been made in the identification of the homologues of Drosophila Notch in a variety of organisms, includ- ing Caenorhabditis elegans [Lin-12 and Glp-1, homo- logues of Drosophila Notch (Yochem and Greenwald, 1989)], Xenopus (Coffman et al., 1990), mouse (Franco del Amo et al., 1992; Kopan and Weintraub, 1993; Lardeli et al., 1994; Reaume et al., 1992), rat (Wein- master et al., 1991, 1992), and human (Ellisen et al., 1991; Blaumueller et al., 1997). Most vertebrates have multiple Notch genes, with each of the products carry- ing out related but distinct cell surface receptor func- Sequence data described in this paper have been deposited with the EMBL/GenBank Data Libraries under Accession No. U89335 for the genomic sequence and Accession No. U95299 for the cDNA se- quence of human NOTCH4. 1 L. Li and G. M. Huang made equal contributions to this work. 2 Present address: Pangea Systems Inc., 1999 Harrison Street, Suite 1100, Oakland, CA 94612. 3 To whom correspondence should be addressed at Department of Molecular Biotechnology, University of Washington, Box 357730, Seattle, WA 98195. Telephone: (206) 616-5104. Fax: (206) 685-7301. GENOMICS 51, 45–58 (1998) ARTICLE NO. GE985330 45 0888-7543/98 $25.00 Copyright © 1998 by Academic Press All rights of reproduction in any form reserved.