Interactions in Oligonucleotide Hybrid Duplexes on Microarrays Hans Binder,* Toralf Kirsten, ² Ivo L. Hofacker, Peter F. Stadler, ²,§ and Markus Loeffler ², | Interdisciplinary Centre for Bioinformatics, UniVersity of Leipzig, Institute of Theoretical Chemistry and Structural Biology, UniVersity of Vienna, Bioinformatics group, Department of Computer Science, and Institute for Medical Informatics, Statistics and Epidemiology, UniVersity of Leipzig, Kreuzstrasse 7b, D-4103 Leipzig, Germany ReceiVed: January 29, 2004; In Final Form: August 23, 2004 We investigated Affymetrix GeneChip intensity data in terms of chip-averaged sensitivities over all perfect match (PM) and mismatch (MM) probes possessing a common triple of neighboring bases in the middle of their sequence. This approach provides a model-independent estimation of base-specific contributions to the probe sensitivities. We found that fluorescent labels attached to nucleotide bases forming Watson-Crick (WC) pairs in most cases decrease their binding affinity and, thus, decrease the sensitivity of the probe. Single-base-related mean sensitivity values rank in ascending order according to C > G T > A. The central base of PM and MM probes mainly forms WC pairings in duplexes with nonspecific transcripts, which obviously dominate the chip-averaged sensitivity values. Linear combinations of the triple-averaged probe sensitivities provide nearest-neighbor (NN) sensitivity terms, which rank in a similar order as the respective NN free-energy terms obtained from previous thermodynamic studies on the stability of RNA/ DNA duplexes in solution. Systematic deviations between both data sets can be mostly attributed to the labeling of the target RNA in the chip experiments. Our results provide a set of molecular NN and single- base-related interaction parameters which consider specific properties of duplex formation in microarray hybridization experiments. Introduction Target binding to high-density oligonucleotide microarrays used for gene expression experiments is governed by the molecular interactions in the hybrid duplexes formed by RNA fragments and DNA probes. The knowledge of the details of the DNA/RNA hybridization behavior on a molecular level and its estimation by means of effective parameters represents one prerequisite for selecting optimal probe sequences from target genes for newly designed chips. Especially short oligonucle- otides might be ineffective as RNA binders as a result of relatively weak interactions between probe and target. Existing methods for chip design mostly involve thermodynamic criteria based on interaction parameters referring to hybrid duplexes in solution for the optimization of probe sequences (see refs 1, 2 and references therein). Recent analyses show that several factors, such as the presence of fluorescent labels, modifies the stability of RNA/DNA duplexes on microarrays compared with duplexes in solution. 3,4 The understanding of the hybridization properties of microarray probes presumably requires a modified view of the molecular interactions in DNA/RNA duplexes, which takes into account labeling and also, possibly, effects due to the fixation of the probes at the quartz surface. Available microarray intensity data are directly related to the binding affinity of the individual probes. 4 They therefore provide valuable information about molecular interactions in RNA/DNA duplexes, which can be used to extract relevant interaction parameters. In this work, we make use of two types of redundancies in the design of Affymetrix GeneChip microarrays, which were created to improve the reliability of the method. 5,6 First, so-called probe sets consisting of 11-20 different reporter probes for each gene allows us to estimate the sensitivity of a probe as the deviation of its intensity from the respective set average in a logarithmic scale. 4 The sensitivity of a micro- array oligonucleotide probe characterizes its ability to detect a certain amount of RNA transcripts independently of the condi- tions of sample preparation, hybridization, and measurement of the fluorescence intensity. It is mainly determined by the affinity of a particular DNA probe to bind RNA fragments via complementary Watson-Crick (WC) pairs. Second, each probe is present in pairs of so-called perfect match (PM) and mismatch (MM) modifications. The sequence of the PM is taken from the gene of interest, and thus, it is complementary to a 25-mer in the RNA target sequence. The sequence of the MM is identical with that of the PM probe except the position in the middle of the oligomer where the middle base is replaced by its complementary base. The pairwise design of probes intends to measure the amount of nonspecific hybridization and, by this way, to correct the PM intensities. An important question for GeneChip data analysis is how to include the MM intensities adequately. One prerequisite for solving this issue is the detailed study of the effect of the MM base in probe-target duplexes on the signal intensity. In the accompanying paper, 4 we found that the middle base systematically shifts the PM and MM probe sensitivities relative to another. Also, other studies reported that the strength of base- pair interaction in the middle of the oligonucleotide affects the affinity of the probes for target binding to an extraordinary extend. 3,7 In addition, stacking interactions between nearest * Corresponding author. E-mail: binder@izbi.uni-leipzig.de. Fax: ++49- 341-1495-119. ² Interdisciplinary Centre for Bioinformatics. Institute of Theoretical Chemistry and Structural Biology. § Department of Computer Science. | Institute for Medical Informatics, Statistics and Epidemiology. 18015 J. Phys. Chem. B 2004, 108, 18015-18025 10.1021/jp049592o CCC: $27.50 © 2004 American Chemical Society Published on Web 10/27/2004