Highly polymorphic repeat region in the CETP promoter induces unusual DNA structure $ Maruja E. Lira, David B. Lloyd, Shawn Hallowell, Patrice M. Milos, John F. Thompson * Genomic and Proteomic Sciences, Pfizer Global Research and Development, Mail Stop 8118D-3069, Eastern Point Road, Groton, CT 06340, USA Received 1 April 2004; received in revised form 19 May 2004; accepted 3 June 2004 Available online 26 June 2004 Abstract Genetic variation in the human cholesteryl ester transfer protein (CETP) promoter is associated with HDL cholesterol levels and cardiovascular disease with much of the genetic variation in CETP attributed to the promoter region. In this region, there are several single nucleotide polymorphisms as well as a variable length tandem repeat located 1946 base pairs upstream of the CETP transcription start that is highly polymorphic with respect to both length and sequence. There are more than 10 different long alleles and these vary in their repeat structure. We find that the short allele of this repeat is associated with high HDL cholesterol levels in vivo ( P < 0.0001). In males, this association is independent of the nearby 629 polymorphism. In addition, the variable length GAAA repeat can stimulate an adjacent GGGGA repeat to form a structure that hinders DNA amplification and sequencing. This structure also has an effect in vivo as shown by orientation effects and cloning efficiency in Escherichia coli. D 2004 Elsevier B.V. All rights reserved. Keywords: Genetic association; HDL; CETP; DNA structure 1. Introduction Cholesteryl ester transfer protein (CETP) has long been known to be a key player in high-density lipoprotein (HDL) metabolism and thus knowledge of how its mass and activity are controlled is important for understanding HDL dynamics (reviewed in Ref. [1]). The absence of CETP in many rodent models makes it difficult to study in those systems. As a result, much effort has been expended on genetic studies in humans to better understand CETP’s role in affecting lipids and disease. One of the first genetic polymorphisms to be identified was the TaqIB RFLP [2] that was later localized to intron 1 in the CETP gene. Although the TaqIB SNP is associated with CETP and HDL cholesterol (HDL-C) levels [3], there is no obvious functional significance for this sequence variation. This SNP is in high linkage disequilib- rium with polymorphisms that have been found in the promoter region that, a priori, appear more likely to play a functional role [3–6]. The magnitude of the effects of CETP promoter variation is small relative to most systems exam- ined in vitro. In vivo, the impact on CETP mass is typically 25% or less [3] and has been associated with a 10–20% alteration of HDL-C, a difference that, although small, can have a substantial impact on disease and thus highly significant from a biological perspective. Previous studies have suggested that either the SNP at 629 or the variable number of tandem repeats (VNTR) at 1946, or possibly both variations, may be causally re- sponsible for the observed TaqIB association with CETP levels [4–9]. The 629 SNP is located within an Sp1/Sp3 binding site and may mediate an effect on transcription via altered protein binding [4,10]. The 629 SNP affects transcription by only 25% in vitro [10] but the small magnitude of this effect is entirely consistent with the observed effect on CETP in vivo. The VNTR at 1946 has not been directly examined for an effect on transcription although a 450-bp region includ- ing the VNTR was deleted with minimal effect on reporter activity [11]. The VNTR consists of multiple GAAA repeats adjacent to a complex pattern of other purines that varies in 1388-1981/$ - see front matter D 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.bbalip.2004.06.002 $ Supplementary data associated with this article can be found, in the online version, at doi; 10.1016/j.bbalip.2004.06.002. * Corresponding author. Tel.: +1-860-441-5139; fax: +1-860-441-0436. E-mail address: john _ f _ Thompson@groton.pfizer.com (J.F. Thompson). www.bba-direct.com Biochimica et Biophysica Acta 1684 (2004) 38 – 45