Data Rotation Improves Genomotyping Efficiency Dirk Repsilber ; 1 , Alex Mira 2 , Hillevi Lindroos 3 , Siv Andersson 3 , and Andreas Ziegler 1 1 Institut fçr Medizinische Biometrie und Statistik, Ratzeburger Allee 160, Universitåt zu Lçbeck, 23538 Lçbeck, Germany 2 DivisiÕn de MicrobiologÌa, Universidad Miguel Hernndez, 03550 Alicante, Spain 3 Department of Molecular Evolution, Evolutionary Biology Center, Uppsala University, 75236 Uppsala, Sweden Received 17 November 2004, revised 6 April 2005, accepted 6 May 2005 Summary Unsequenced bacterial strains can be characterized by comparing their genomic DNA to a sequenced reference genome of the same species. This comparative genomic approach, also called genomotyping, is leading to an increased understanding of bacterial evolution and pathogenesis. It is efficiently accom- plished by comparative genomic hybridization on custom-designed cDNA microarrays. The microarray experiment results in fluorescence intensities for reference and sample genome for each gene. The log- ratio of these intensities is usually compared to a cut-off, classifying each gene of the sample genome as a candidate for an absent or present gene with respect to the reference genome. Reducing the usually high rate of false positives in the list of candidates for absent genes is decisive for both time and costs of the experiment. We propose a novel method to improve efficiency of genomotyping experiments in this sense, by rotating the normalized intensity data before setting up the list of candidate genes. We analyze simulated genomotyping data and also re-analyze an experimental data set for comparison and illustration. We approximately halve the proportion of false positives in the list of candidate absent genes for the example comparative genomic hybridization experiment as well as for the simulation experiments. Key words: Comparative genomic hybridization, Microarray, False discovery proportion, Data transformation. 1 Introduction Comparing genomes in families of pathogenic bacteria, where the different bacterial strains differ in host specificity and virulence, enables a better understanding of bacterial pathogenesis and evolution, resulting in improved diagnostics or selection of potential targets for vaccine development. This ap- proach has given rise to the field of microbial comparative genomics, also called bacterial genomotyp- ing (detailed overviews in Joyce et al., 2002; Fitzgerald et al., 2001; Dorrell et al., 2002). Here, genomes of unsequenced bacterial strains are compared to a sequenced reference genome of the same bacterial species to identify absent, or highly divergent genes. Comparative genomic hybridization on custom-designed low-cost cDNA microarrays (Dorrell et al., 2001) represents an efficient technology for this approach. Genomic DNAs of reference and sample strains are labeled with two fluorescence colors and hybridized onto the microarray which is prepared from all genes of the reference genome. Genes present in both genomes are expected to give a clear fluorescence signal for both colors, whereas genes which are absent in the sample genome ideally give only a signal for the reference color. The conventional analysis, as described in Section 2, therefore builds upon ranking genes by * Corresponding author: e-mail: dirk.repsilber@imbs.uni-luebeck.de, Phone þ49 451 500 2788, Fax: þ49 451 500 2999 Biometrical Journal 47 (2005) 4, 585 – 598 DOI: 10.1002/bimj.200410160 # 2005 WILEY-VCH Verlag GmbH &Co. KGaA, Weinheim