Data Mining in situ Gene Expression Patterns at Cellular Resolution James Carson Baylor College of Medicine, One Baylor Plaza, Houston TX 77030 Christina Thaller Baylor College of Medicine, One Baylor Plaza, Houston TX 77030 Musodiq Bello The University of Houston, 4800 Calhoun Rd., Houston TX 77204 Wah Chiu Baylor College of Medicine, One Baylor Plaza, Houston TX 77030 Tao Ju Rice University 6100 S. Main, Houston TX 77005 Joe Warren Rice University 6100 S. Main, Houston TX 77005 Ioannis Kakadiaris The University of Houston, 4800 Calhoun Rd., Houston TX 77204 Gregor Eichele Baylor College of Medicine, One Baylor Plaza, Houston TX 77030 Abstract In the post-genomic era, large-scale efforts have begun to characterize the role of gene products. Several of these efforts aim to systematically discover the activity of all ~20,000 genes throughout functionally complex tissue specimens such as embryo and the mature brain. By applying a subdivision-based deformable model of the brain, we rapidly organize spatial gene expression data into a common coordinate system. Doing this enables powerful queries, comparisons, and associations of the data. 1. Introduction Non-radioactive in situ hybridization (ISH) is a powerful technique for revealing gene expression in individual cells, the level of detail necessary for investigating how genes control cell type identity, cell differentiation, and cell-cell signaling (Fig. 1). Although the availability of robotic ISH enables the expeditious determination of expression patterns for thousands of genes in serially sectioned tissues, a large collection of ISH images is, per se, of limited benefit [1]. However, via accurate detection of expression strength and spatial normalization of expression location across different specimens, ISH images become a minable resource of annotated gene expression capable of advancing functional genomics in a mode similar to DNA sequence databases. 2. Methods We have developed computational methods to automate ISH image annotation and applied these to over 200 genes throughout the postnatal mouse brain. First, gene expression strengths were semi-quantitatively characterized for each cell in a tissue section [2]. Atlas-based segmentation was then performed using a series of subdivision mesh maps that comprise our atlas of the postnatal mouse brain [3]. These maps were deformed to fit the tissue sections containing gene expression, and the detected expression strengths were associated with the directly overly mesh to provide a common geometric annotation of gene expression. Automated textual annotation of expression patterns took advantage of the explicitly defined boundaries of the mesh. 3. Results Automated textual annotations of gene expression patterns were found to match accurately the annotations determined visually by expert. Spatial searches were successfully applied