Bayesian methods for microarray data Alex Lewin and Sylvia Richardson Department of Epidemiology and Public Health, Imperial College, Norfolk Place, London W2 1PG, UK January 11, 2007 Summary We review the use of Bayesian methods for analyzing gene expression data. We focus on methods which select groups of genes on the basis of their expression in RNA samples derived under different experimental conditions. We first describe Bayesian methods for estimating gene expression level from the intensity measurements obtained from analysis of microarray images. We next discuss the issues involved in assessing differential gene expression between two conditions at a time, including models for classifying the genes as differentially expressed or not. In the last two sections, we present models for grouping gene expression profiles over different experimental conditions, in order to find co-expressed genes, and multivariate models for finding gene signatures, i.e. for selecting a parsimonious group of genes that discriminate between entities such as subtypes of disease. Keywords: Bayesian hierarchical models, differential expression, profile clustering, mixture models, gene selection, shrinkage priors, models for cDNA arrays, models for oligonucleotide arrays, gene expression. Related Chapters: hsg006, hsg007 1. Introduction High throughput technologies such as DNA microarrays have emerged over the last 5-10 years as one of the key source of information for functional genomics. Microar- rays permit researchers to capture one of the fundamental process in molecular biology, the transcription process from genes into mRNA (messenger RNA), that will be subse- quently translated to form proteins. This process is called gene expression. By quanti- fying the amount of transcription, microarrays allow the identification of the genes that are expressed in different types of cells, different tissues and to understand the cellular processes in which they intervene, thus giving a unique insight into the function of genes. However, transforming the huge quantity of data which is currently produced email: a.m.lewin@imperial.ac.uk