Divya Kapoor Sanjeev Kumar Chandrayan Shubbir Ahmed Purnananda Guptasarma Division of Protein Science and Engineering, Institute of Microbial Technology (IMTECH), Chandigarh, India Received May 16, 2007 Revised June 18, 2007 Accepted June 19, 2007 Short Communication Using DNA sequencing electrophoresis compression artifacts as reporters of stable mRNA structures affecting gene expression The formation of secondary structure in oligonucleotide DNA is known to lead to “com- pression” artifacts in electropherograms produced through DNA sequencing. Separately, the formation of secondary structure in mRNA is known to suppress translation; in partic- ular, when such structures form in a region covered by the ribosome either during, or shortly after, initiation of translation. Here, we demonstrate how a DNA sequencing com- pression artifact provides important clues to the location(s) of translation-suppressing sec- ondary structural elements in mRNA. Our study involves an engineered version of a gene sourced from Rhodothermus marinus encoding an enzyme called Cel12A. We introduced this gene into Escherichia coli with the intention of overexpressing it, but found that it expressed extremely poorly. Intriguingly, the gene displayed a remarkable compression artifact during DNA sequencing electrophoresis. Selected “designer” silent mutations destroyed the artifact. They also simultaneously greatly enhanced the expression of the cel12A gene, presumably by destroying stable mRNA structures that otherwise suppress translation. We propose that this method of finding problem mRNA sequences is superior to software-based analyses, especially if combined with low-temperature CE. Keywords: Compression artifact / DNA sequencing electrophoresis / Nucleic acid secondary structure DOI 10.1002/elps.200700359 3862 Electrophoresis 2007, 28, 3862–3867 It is widely appreciated that the electrophoretic separa- tion of oligonucleotide (oligo) populations with a resolution of one nucleotide base length is critical for DNA sequencing. It is also known that compression artifacts owing to second- ary structure formation in DNA (e.g., hairpins or stem-loops) present striking anomalies in oligo separations that frustrate DNA sequencing [1–3]. Figure 1A shows such an anomaly which was culled-out from the middle of a sequence readout of engineered cel12A. Because the dye-blobs that are some- times seen in DNA sequences at nucleotide read lengths of about 70 nt happened to overlap with the latter parts of the anomalous sequence, we read the sequence manually by examining the colored electropherograms corresponding to the four DNA bases. The manually read sequence beginning with, and including, the BamH1 site (shown in bold letters below) is: @ GGATCC ACTGTTGAGTCGGGTGGG ACACGAGA- ACGG@. The sequence should, however, actually have been: @ GGATCC ACTGTCGAGCTGTTCGGACAATGGG ACA- CGAGAACGG”. In the above sequences, the region suffering from the anomaly is shown flanked by 11 or 12 correctly sequenced (underlined) nucleotides flanking it on either side. It is clear that the anomalous sequence is both compressed in relation to the correct sequence, and wrong in its detail. We were extremely intrigued by this anomaly which was reproducibly observed in a number of different sequencing reactions involving different clones of the gene. What inter- ested us especially was the fact that the anomaly was located very close to the 5 0 -end of the gene, which had been inserted between the BamHI and HindIII restriction sites of the expression vector, pQE-30 (Qiagen). The BamHI site itself consists of two codons (GGA and TCC) which encode the eleventh and twelfth residues, respectively, of a 12 residues-long affinity tag encoding the sequence “MRGSHHHHHHGS” which is separated from the 3 0 -end of the ribosome binding site (RBS, also known as the Shine– Dalgarno sequence) by nine bases. This generates a 50 bases- long separation between the ribosome-binding site and the compression artifact. To examine whether the compression anomaly could reflect the potential for formation of a secondary structural element in mRNA that would halt, or adversely affect, the Correspondence: Dr. Purnananda Guptasarma, Division of Pro- tein Science and Engineering, Institute of Microbial Technology (IMTECH), Sector 39-A, Chandigarh 160036, India E-mail: pg@imtech.res.in Fax: 191-172-2690585 Abbreviation: oligo, oligonucleotide 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.electrophoresis-journal.com