Ad-Hoc Segmentation Pipeline for Microarray Image Analysis S. Battiato, G. Di Blasi, G. M. Farinella, G. Gallo, G. C. Guarnera {battiato, gdiblasi, gfarinella, gallo}@dmi.unict.it g.guarnera@studenti.unict.it IPLab – Image Processing Laboratory http://www.dmi.unict.it/~iplab Dipartimento di Matematica e Informatica University of Catania, Via Andrea Doria 6 – 95125, Catania (Italy) ABSTRACT Microarray is a new class of biotechnologies able to help biologist researches to extrapolate new knowledge from biological experiments. Image Analysis is devoted to extrapolate, process and visualize image information. For this reason it has found application also in Microarray, where it is a crucial step of this technology (e.g. segmentation). In this paper we describe MISP (Microarray Image Segmentation Pipeline), a new segmentation pipeline for Microarray Image Analysis. The pipeline uses a recent segmentation algorithm based on statistical analysis coupled with K-Means algorithm. The Spot masks produced by MISP are used to determinate spots information and quality measures. A software prototype system has been developed; it includes visualization, segmentation, information and quality measure extraction. Experiments show the effectiveness of the proposed pipeline both in terms of visual accuracy and measured quality values. Comparisons with existing solutions (e.g. Scanalyze [1]) confirm the improvement with respect to previously published works. Keywords: Image Analysis, Microarray, Image Segmentation, Bioinformatics 1. INTRODUCTION Image analysis has found application in Microarray technology, because it is able to extrapolate new and not trivial knowledge often hidden in the images. Microarray is a small solid support on which sequences of DNA of hundreds or thousands of different genes are spotted in fixed positions; the order of positioning is useful to the biomedical researcher to identify a specific genes sequence. Microarray technology has changed the operating way of the biologists, allowing the scientific observation of a great number of genes under several conditions simultaneously, in a single experiment. The Microarray technology consists of different sequential phases: biomedical questions, experimental design, microarray experiment, image and data analysis, biological verification. In each experiment (e.g. comparison between two tissue samples), two 16-bit TIFF images are obtained by using Microarray scanners capturing the fluorescence (Cy3, 510-550 nm, Green and Cy5, 630-660 nm, Red). The corresponding intensity images is proportional to the observed fluorescence. For sake of visualization, the images are combined together to derive a single 24-bit RGB image in which blue channel is set to zero while R-G image compression is usually used. For each spot, the relative intensity of the channels Red and Green is determined [4]: o prevalence of the red intensity denotes greater expression in the mutant sample; o prevalence of the green intensity denotes greater expression in the reference sample; o yellow denote a parity of expression. Image analysis is a crucial aspect of Microarray experiments. It has a potentially large impact on subsequent analysis such as clustering or identification of differentially expressed genes [5]. In the Microarray technology context, images are usually processed by the following steps [6]: o Addressing or Gridding: spot coordinates assignment; o Segmentation: pixel classification in terms of foreground, signal of interest, and background; o Intensity extraction and quality measures: spot evaluation of red and green foreground/background intensity pairs and quality measures [7], [8], [9].