[CANCER RESEARCH 61, 4320 – 4324, June 1, 2001] Advances in Brief Discovery of New Markers of Cancer through Serial Analysis of Gene Expression: Prostate Stem Cell Antigen Is Overexpressed in Pancreatic Adenocarcinoma 1 Pedram Argani, 2 Christophe Rosty, Robert E. Reiter, Robb E. Wilentz, Selva R. Murugesan, Steven D. Leach, Byungwoo Ryu, Halcyon G. Skinner, Michael Goggins, Elizabeth M. Jaffee, Charles J. Yeo, John L. Cameron, Scott E. Kern, and Ralph H. Hruban Departments of Pathology [P. A., C. R., R. E. W., S. R. M., H. G. S., M. G., S. E. K., R. H. H.], Surgery [S. D. L., C. J. Y., J. L. C.], Oncology [S. D. L., B. R., M. G., E. M. J., C. J. Y., S. E. K., R. H. H.], and Medicine [M. G.], The Johns Hopkins Medical Institutions, Baltimore, Maryland 21287; Department of Epidemiology, The Johns Hopkins School of Public Health, Baltimore, Maryland 21287 [H. G. S.]; and Department of Urology, University of California, Los Angeles, California 90095 [R. E. R.] Abstract Serial analysis of gene expression (SAGE) can be used to quantify gene expression in human tissues. Comparison of gene expression levels in neoplastic tissues with those seen in nonneoplastic tissues can, in turn, identify novel tumor markers. Such markers are urgently needed for highly lethal cancers like pancreatic adenocarcinoma, which typically presents at an incurable, advanced stage. The results of SAGE analyses of a large number of neoplastic and nonneoplastic tissues are now available online, facilitating the rapid identification of novel tumor markers. We searched an online SAGE database to identify genes preferentially ex- pressed in pancreatic cancers as compared with normal tissues. SAGE libraries derived from pancreatic adenocarcinomas were compared with SAGE libraries derived from nonneoplastic tissues. Three promising tags were identified. Two of these tags corresponded to genes (lipocalin and trefoil factor 2) previously shown to be overexpressed in pancreatic car- cinoma, whereas the third tag corresponded to prostate stem cell antigen (PSCA), a recently discovered gene thought to be largely restricted to prostatic basal cells and prostatic adenocarcinomas. PSCA was expressed in four of the six pancreatic cancer SAGE libraries, but not in the libraries derived from normal pancreatic ductal cells. We confirmed the overex- pression of the PSCA mRNA transcript in 14 of 19 pancreatic cancer cell lines by reverse transcription-PCR, and using immunohistochemistry, we demonstrated PSCA protein overexpression in 36 of 60 (60%) primary pancreatic adenocarcinomas. In 59 of 60 cases, the adjacent nonneoplastic pancreas did not label for PSCA. PSCA is a novel tumor marker for pancreatic carcinoma that has potential diagnostic and therapeutic impli- cations. These results establish the validity of analyses of SAGE databases to identify novel tumor markers. Introduction SAGE 3 is a recently described technique that allows one to obtain a quantitative and comprehensive profile of cellular gene expression (1, 2). Briefly, in this procedure, cellular mRNA transcripts are converted to cDNA and then cleaved at specific sites by restriction enzymes into small (10 –14 bp) fragments, also known as tags. These tags are ligated together into difragments, amplified by PCR, and then concatenated and sequenced as one long fragment of DNA. Each 10 –14-bp fragment (tag) should uniquely identify a specific gene transcript because it corresponds to a defined sequence near the transcript’s 3' terminus, as dictated by the tagging restriction enzyme used (1). The abundance of each tag provides a quantitative measure of the transcript level present within the mRNA sample analyzed, which therefore allows expression levels of specific transcripts to be compared between two samples (2). This ability to quantitate gene expression represents a major advantage of SAGE over other methods of screening cDNA libraries for differentially expressed genes. In the initial demonstration of the SAGE technique, a gene expres- sion profile of the normal pancreas was constructed and validated by Northern blotting (1). Subsequently, Zhang et al. (2) used SAGE to demonstrate differences in expression patterns between colonic and pancreatic adenocarcinomas and normal colonic epithelium. Such applications of SAGE hold tremendous promise for the identification of diagnostic and/or prognostic markers of malignancy. Indeed, the above-referenced analyses identified several promising serum mark- ers for pancreatic carcinoma, such as tissue inhibitor of metallopro- teinase 1 (3). Three recent advances have made analyses of SAGE libraries for differentially expressed genes more feasible. First, rapid progress in the Human Genome Project has facilitated the mapping of specific genes to individual tags specified by SAGE (4). Fewer tags now correspond to ESTs of unknown origin, and more can be assigned to known genes. Second, a large number of normal and neoplastic tissues have now been analyzed by SAGE, creating extremely large databases for study. Third, much of this database is now online and available to the general public (5, 6). 4 As of February 1, 2001, this online database included 88 SAGE libraries, and 3,632,974 tags. Armed with these tools, we searched an online SAGE database to identify novel markers of pancreatic adenocarcinoma. Materials and Methods Based on the identification of differentially expressed genes in our ongoing SAGE investigation of pancreatic cancer, 5 the xProfiler program available online 4 was used to compare gene expression patterns in pancreatic cancer with those in nonneoplastic tissues. In this program, one can select SAGE libraries for analysis and then compare the tags in one group of online SAGE libraries with the tags in another group. We used two queries to determine differentially expressed genes. In the first strategy, we chose a pancreatic adenocarcinoma group composed of the SAGE libraries of four pancreatic cancer cell lines that yielded 96,494 total tags (CAPAN1, 37,926 tags; CAPAN2, 23,222 tags; HS766T, 10,467 tags; and Panc1, 24,879 tags). The nonneoplastic comparison group in this analysis was composed of the SAGE libraries of two short-term cultures of normal pancreatic duct epithelial cells that yielded 64,577 tags (HX, 32,157 tags; and H126, 32,420 tags). In the second query, we expanded both Received 2/22/01; accepted 4/12/01. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. 1 Supported by the Specialized Program of Research Excellence (SPORE) in Gastro- intestinal Cancer p50-CA62924, The National Pancreas Foundation, and The Michael Rolfe Fund for pancreatic cancer research. 2 To whom requests for reprints should be addressed, at The Johns Hopkins Hospital- Surgical Pathology, The Harry and Jeanette Weinberg Building, 401 North Broadway, Room 2242, Baltimore, MD 21231-2410. Phone: (410) 614-2428; Fax: (410) 955-0115; E-mail: pargani@jhmi.edu. 3 The abbreviations used are: SAGE, serial analysis of gene expression; PanIN, pancreatic intraepithelial neoplasia; PSCA, prostate stem cell antigen; TFF2, trefoil factor 2; RT-PCR, reverse transcription-PCR; EST, expressed sequence tag. 4 http://www.ncbi.nlm.nih.gov/SAGE. 5 B. Ryu, J. Jones, M. A. Hollingsworth, R. H. Hruban, and S. E. Kern. Identification of differentially expressed genes by serial analysis of gene expression profiling in pancreatic cancer, manuscript in preparation. 4320 Research. on November 14, 2015. © 2001 American Association for Cancer cancerres.aacrjournals.org Downloaded from