Genetica 108: 41–46, 2000. © 2000 Kluwer Academic Publishers. Printed in the Netherlands. 41 High throughput screening of gene expression signatures A. Kuklin, S. Shams & S. Shah BioDiscovery Inc., 11150 West Olympic Boulevard, Suite 1170, Los Angeles, CA 9006, USA; Address for correspondence: 11150 West Olympic Boulevard, Suite 1170, Los Angeles, CA 90064, USA (Phone: 310 966 9366; Fax: 310 966 9346; E-mail: akuklin@biodiscovery.com) Key words: data analysis, image analysis, microarrays, software, automation Abstract This paper focuses on microarray image analysis and discusses a completely automated approach to image pro- cessing, which eliminates human intervention. A system for automated image processing is described, which is capable of processing image files in a batch-mode thus allowing high-throughput of microarray image analysis. Grid-placement and spot finding are achieved without operator’s help. The software eliminates noise signals from the data analysis process and minimizes operator’s involvement in the procedure. Introduction Drug discovery is being transformed by introduction of new automated technologies and bioinformatics applications. Precise dissection of gene expression during a drug study is achieved in high-throughput fashion with cDNA microarray technology, which is providing an unprecedented means for carrying out high-throughput gene expression analysis experi- ments (Debouck & Goodfellow, 1999). Comprehens- ive genome-wide surveys of gene expression patterns are being applied to various genomes (see Brown & Botstein, 1999). Microarrays are microscope slides or membranes containing hundreds to thousands or tens of thousands of immobilized DNA samples (Duggan et al., 1999). This array of cDNA-spots is subsequently probed with fluorescently labeled cDNAs, which are obtained by reverse transcription from total RNA pools corres- ponding to the test and reference biological sources. The power of this methodology relies on its abil- ity to simultaneously register hybridization signals, which accurately reflect physiological dynamics. Each microarray project can examine several microarrays containing various sets of information, ranging from sequence data on the genes or clones placed on each slide to quantified expression values for each gene under different experimental conditions. Following the above hybridization step with two dye-tagged probes or other labeling methods, the mi- croarray is scanned to generate two images, each one corresponding to one of the dye ‘colors’. Thus, the level of intensity at each particular point in each image corresponds to the amount of probe, tagged with the corresponding color dye, at that position. These im- ages are typically captured as 16-bit TIFF formatted files containing as much as 20,000,000 picture ele- ments (pixels). The fundamental goal of array image processing is to measure the intensity of the arrayed spots and quantify their expression values based on the intensity output. Another important aspect of ar- ray image processing is to assess the reliability of the quantified spot data to aid the later stages of data analysis. Until today the challenge has been to convert the pixels into accurately quantified gene expression data, without jeopardizing biological signals. Nowadays, with the increase of microarray experiments and users, and the acceptance of microarray technology as a tool for whole genome expression analysis, the goal is to process microarray images in an automated and highly accurate fashion. In this report we describe the devel- opment and applications of a completely automated system for microarray image analysis. This system is able to process microarray images in a batch mode without human intervention.