Empirical Bayes Identication of Tumor Progression Genes from Microarray Data Debashis Ghosh *,1 and Arul M. Chinnaiyan 2 1 Department of Biostatistics, University of Michigan, 1420 Washington Heights, Ann Arbor, Michigan 48109-2029, U.S.A. 2 Departments of Pathology and Urology, University of Michigan, U.S.A. Received 20 May 2006, revised 2 June 2006, accepted 22 September 2006 Summary The use of microarray data has become quite commonplace in medical and scientific experiments. We focus here on microarray data generated from cancer studies. It is potentially important for the discov- ery of biomarkers to identify genes whose expression levels correlate with tumor progression. In this article, we propose a simple procedure for the identification of such genes, which we term tumor progression genes. The first stage involves estimation based on the proportional odds model. At the second stage, we calculate two quantities: a q-value, and a shrinkage estimator of the test statistic is constructed to adjust for the multiple testing problem. The relationship between the proposed method with the false discovery rate is studied. The proposed methods are applied to data from a prostate cancer microarray study. Key words: Gene Expression; Metastasis; Mixture Models; Multiple Comparisons; Prostate Cancer. 1 Introduction The use of DNA microarray technology has allowed for new understanding of various cancers. The hybridization of cDNA to arrays containing thousands of genes and ESTs permits a global genome- wide evaluation of tumor samples. This technology has has led to development of statistical methodol- ogy in various areas of microarray data analysis, such as methods for differential expression (Efron et al., 2001; Dudoit et al., 2002b), clustering (Eisen et al., 1998) and classification (Hastie et al., 2000; Dudoit et al., 2002a). The motivating example is from a microarray experiment in prostate cancer (Dhanasekaran et al., 2001). We have profiled tissue samples from various stages of prostate cancer (e.g., normal adjacent prostate, benign prostatic hyperplasia, localized prostate cancer, advanced metastatic prostate cancer). The samples are linked to a patient clinical database that has other parameters, such as Gleason score, survival time and status, and time to PSA recurrence. One of the main hypotheses of interest to scientists is that there exist distinct sets of genes and proteins dictate progression from precursor lesion, to localized disease, and finally to metastatic disease. This hypothesis is biological in nature and is focused upon learning about which genes are involved in cancer pathways. We will refer to genes satisfying this hypothesis as tumor progressor genes. The ideal design for studying development of gene expression profiles in tumors would be a lon- gitudinal experiment. The tumor is commonly thought to originate as a progenitor cell and goes through several stages of progression (e.g., benign hyperplasia, in situ). Such a model for tumor pro- gression has been postulated by Fearon and Vogelstein (1990). If it were possible to sample the same * Corresponding author: e-mail: ghoshd@umich.edu, Phone: 007346159824, Fax: 007347632215 68 Biometrical Journal 49 (2007) 1, 68–77 DOI: 10.1002/bimj.200610312 # 2007 WILEY-VCH Verlag GmbH &Co. KGaA, Weinheim