[CANCER RESEARCH 61, 5979 –5984, August 15, 2001] Advances in Brief Estrogen Receptor Status in Breast Cancer Is Associated with Remarkably Distinct Gene Expression Patterns 1 Sofia Gruvberger, Markus Ringne ´r, Yidong Chen, Sujatha Panavally, Lao H. Saal, Åke Borg, Mårten Ferno ¨, Carsten Peterson, and Paul S. Meltzer 2 Department of Oncology [S. G., Å. B., M. F.] and Complex Systems Division, Department of Theoretical Physics [M. R., C. P.], Lund University, SE-221 00 Lund, Sweden, and Cancer Genetics Branch, National Human Genome Research Institute [S. G., M. R., Y. C., S. P., L. H. S., P. S. M.], NIH, Bethesda, Maryland 20892 Abstract To investigate the phenotype associated with estrogen receptor (ER) expression in breast carcinoma, gene expression profiles of 58 node- negative breast carcinomas discordant for ER status were determined using DNA microarray technology. Using artificial neural networks as well as standard hierarchical clustering techniques, the tumors could be classified according to ER status, and a list of genes which discriminate tumors according to ER status was generated. The artificial neural net- works could accurately predict ER status even when excluding top dis- criminator genes, including ER itself. By reference to the serial analysis of gene expression database, we found that only a small proportion of the 100 most important ER discriminator genes were also regulated by estradiol in MCF-7 cells. The results provide evidence that ERand ERtumors display remarkably different gene-expression phenotypes not solely ex- plained by differences in estrogen responsiveness. Introduction Estrogens are important regulators of growth and differentiation in the normal mammary gland and are also important in the development and progression of breast carcinoma. Estrogens regulate gene expres- sion via ER, 3 however the details of the estrogen effect on down- stream gene targets, the role of cofactors, and cross-talk between other signaling pathways are far from fully understood. As approximately two-thirds of all breast cancers are ER+ at the time of diagnosis, the expression of the receptor has important implications for their biology and therapy (1). Opinions differ as to whether those breast cancers which lack ER expression at diagnosis arise from an ER- compart- ment within the mammary epithelium or represent evolution from an ER+ to an ER- state (2). The cDNA microarray technology allows for parallel analysis of the expression of thousands of genes (3) to address complex questions in tumor biology. Statistical tools are required to analyze the large amount of expression data generated by this methodology. ANNs are computer-based algorithms for pattern recognition that are capable of learning from experience (4). The diagnosis of myocardial infarcts (5) and heart arrhythmias from electrocardiograms (6) are examples of applications of ANNs in medicine. We have recently demonstrated the utility of ANNs for the diagnostic classification of tumors using cDNA microarray data (7). In this study, we have applied ANNs as well as conventional methods to analyze cDNA microarray data from a selected group of node-negative breast cancers that differ with respect to their ER status. Here we report that ER+ and ER- tumors display remarkably different phenotypes, which may be attributable to their evolution from distinct cell lineages. Materials and Methods Tissues and Cells. Fifty-eight grossly dissected primary tumors from node- negative breast cancer patients, tumor size 20 –50 mm, were collected at the University Hospital, Lund, Sweden. Microscopic examination of touch prep- arations verified the presence of cancer cells in all samples. To train the classifier described below, 47 tumors, all from two previous randomized studies (Ref. 8) 4 were selected so that roughly half, 23, were ER+ (range, 50 –1900 fmol/mg protein; median, 160), whereas the remaining 24 were ER- (range, 0 –9 fmol/mg protein, median 0.7). In addition, 14 of the patients were premenopausal (5 ER+ and 9 ER-) and 33 were postmenopausal (18 ER+ and 15 ER-). To obtain an independent test set, the remaining 11 of the 58 tumors were selected from an ongoing clinical trial and used here as a blinded test set. Of the 11 blinded samples, 5 were ER+ (range, 40 –120 fmol/mg protein; median, 60), 6 were ER- (range, 0 –3 fmol/mg protein; median, 1.5), and all were premenopausal. ER protein determinations were performed using standard methods in the routine clinical laboratory (9). BT-474 cells, obtained from American Type Culture Collection, were maintained in RPMI 1640 supplemented by 10% fetal bovine serum, penicillin, and streptomycin. Cells were harvested at 60 – 80% confluency and used as a reference in all hybrid- izations. RNA Isolation and cDNA Microarrays. Total RNA was isolated from cell lines using the RNeasy kit (Qiagen, Valencia, CA) with subsequent Trizol (Life Technologies, Inc., Rockville, MD) purification. Total RNA from tumors was isolated using two successive rounds of Trizol. Microarrays were prepared and hybridized as described previously (3, 10, 11) and according to standard protocols. 5 Briefly, the arrays were spotted with 6,728 sequence-verified cDNA clones, of which 4000 were named human genes and the remaining clones were expressed sequence tags. BT-474 RNA (200 g) and 65–100 g of tumor RNA were used to produce labeled cDNA by anchored oligo(dT)- primed reverse transcription using SuperScript II reverse transcriptase (Life Technologies, Inc.) in the presence of either Cy5-dUTP or Cy3-dUTP (Am- ersham Pharmacia, Piscataway, NJ), respectively. Fluorescence scanning and image analysis with DeArray software were performed as described previously (12, 13). Data Analysis. For each gene, the fluorescent intensity of the most intense channel [red (Cy3) or green (Cy5)] for each sample, was averaged over all samples. All genes for which this average exceeded 2,000 fluorescence units (scale 0 – 65,535 units) were included in the analysis. In addition, we required, for all samples, that the red and green intensities both exceeded 20 fluores- cence units and that the union (of the two channels) spot area exceeded 30 pixels. For the 58 (47 + 11) measured samples, these requirements left us with Received 4/26/01; accepted 6/25/01. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. 1 Supported in part by the Swedish Research Council and the Knut and Alice Wal- lenberg Foundation through the SWEGENE consortium (to M. R.) and the Swedish Foundation for Strategic Research (to C. P.). This work was partly supported by grants from the Lund University Medical Faculty, the Swedish Cancer Society, Berta Kamprad’s Foundation, the Gunnar Arvid and Elisabeth Nilsson Foundation, the Hospital of Lund Foundations, the E and F Bergqvist Foundation, and King Gustav V ’s Jubilee Foundation. 2 To whom requests for reprints should be addressed, at National Human Genome Research Institute, NIH, 49 Convent Drive, Bethesda, MD 20892-4470. Phone: (301) 594- 5283; Fax: (301) 402-3281; E-mail: pmeltzer@nhgri.nih.gov. 3 The abbreviations used are: ER, estrogen receptor ; ANN, artificial neural network; E2, estradiol; PCA, principal component analysis; ROC, receiver operating characteristic; MDS, multidimensional scaling; WGA, weighted gene analysis; SAGE, serial analysis of gene expression; GATA3, GATA-binding protein; 3 TFF3, trefoil factor 3. 4 Å. Borg, M. Ferno ¨, unpublished results. 5 Internet address: http://www.nhgri.nih.gov/DIR/LCG/15K/HTML/protocol.html. 5979 Research. on January 10, 2022. © 2001 American Association for Cancer cancerres.aacrjournals.org Downloaded from