Applications of Multilevel Thresholding Algorithms to Transcriptomics Data Luis Rueda and Iman Rezaeian School of Computer Science, University of Windsor, 401 Sunset Ave., Windsor, ON, N9B3P4, Canada {lrueda,rezaeia}@uwindsor.ca Abstract. Microarrays are one of the methods for analyzing the expression lev- els of genes in a massive and parallel way. Since any errors in early stages of the analysis affect subsequent stages, leading to possibly erroneous biological conclusions, finding the correct location of the spots in the images is extremely important for subsequent steps that include segmentation, quantification, normal- ization and clustering. On the other hand, genome-wide profiling of DNA-binding proteins using ChIP-seq and RNA-seq has emerged as an alternative to ChIP-chip methods. Due to the large amounts of data produced by next generation sequenc- ing technology, ChIPseq and RNA-seq offer much higher resolution, less noise and greater coverage than its predecessor, the ChIPchip array. Multilevel thresholding algorithms have been applied to many problems in image and signal processing. We show that these algorithms can be used for tran- scriptomics and genomics data analysis such as sub-grid and spot detection in DNA microarrays, and also for detecting significant regions based on next gen- eration sequencing data. We show the advantages and disadvantages of using multilevel thresholding and other algorithms in these two applications, as well as an overview of numerical and visual results used to validate the power of the thresholding methods based on previously published data. Keywords: microarray image gridding, image analysis, multi level thresholding, transcriptomics. 1 Introduction Among other components, the genome contains a set of genes required for an organism to function and evolve. However, the genome is only a source of information and in order to function, the genes express themselves into proteins. The transcription of genes to produce RNA is the first stage of gene expression. The transcriptome can be seen as the complete set of RNA transcripts produced by the genome. Unlike the genome, the transcriptome is very dynamic. Despite having the same genome regardless of the type of cell or environmental conditions, the transcriptome varies considerably in differing circumstances because of the different ways the genes may express. Transcriptomics, the field that studies the role of the transcriptome, provides a rich source of data suitable for pattern discovery and analysis. The quantity and size of these data may vary based on the model and underlying methods used for analysis. In gene C. San Martin and S.-W. Kim (Eds.): CIARP 2011, LNCS 7042, pp. 26–37, 2011. c Springer-Verlag Berlin Heidelberg 2011