Analyzing Patterns of Staining in Immunohistochemical Studies: Application to a Study of Prostate Cancer Recurrence Ruth Etzioni, 1 Sarah Hawley, 1 Dean Billheimer, 3 Lawrence D. True, 2 and Beatrice Knudsen 1 1 Fred Hutchinson Cancer Research Center; 2 Department of Pathology, University of Washington, Seattle, Washington; and 3 Division of Biostatistics, Vanderbilt-Ingram Cancer Center, Nashville, Tennessee Abstract Background: Immunohistochemical studies use antibodies to stain tissues with the goal of quantifying protein expression. However, protein expression is often heterogeneous resulting in variable degrees and patterns of staining. This problem is particularly acute in prostate cancer, where tumors are infiltrative and heterogeneous in nature. In this article, we introduce analytic approaches that explicitly consider both the frequency and intensity of tissue staining. Methods: Compositional data analysis is a technique used to analyze vectors of unit-sum proportions, such as those obtained from soil sample studies or species abundance surveys. We summarized specimen staining patterns by the proportion of cells staining at mild, moderate, and intense levels and used compositional data analysis to summarize and compare the resulting staining profiles. Results: In a study of Syndecan-1 staining patterns among 44 localized prostate cancer cases with Gleason score 7 disease, compositional data analysis did not detect a statistically significant difference between the staining patterns in recurrent (n = 22) versus nonrecurrent (n = 22) patients. Results indicated only modest increases in the proportion of cells staining at a moderate intensity in the recurrent group. In contrast, an analysis that compared quantitative scores across groups indicated a (borderline) significant increase in staining in the recurrent group (P = 0.05, t test). Conclusions: Compositional data analysis offers a novel analytic approach for immunohistochemical studies, provid- ing greater insight into differences in staining patterns between groups, but possibly lower statistical power than existing, score-based methods. When appropriate, we rec- ommend conducting a compositional data analysis in addition to a standard score-based analysis. (Cancer Epi- demiol Biomarkers Prev 2005;14(5):1040 – 6) Background Immunohistochemical studies are designed to detect the expression of proteins at the cellular level. In such studies, an antibody is applied to a tissue specimen, and antibody binding is detected using a chromogenic substrate that results in a color change where the protein is localized. The specimen is then examined under a microscope and the extent and intensity of color staining is assessed by an observer. Immunohistochemical studies have become an integral part of the process of biomarker development, where they are used to determine whether overexpression or underexpression of potential markers predicts disease status. For example, several novel proteins have been shown to be markedly overexpressed in prostate cancer tissue, including a-methylacyl-CoA Race- mase (1), EZH2 (2), and pim-1 kinase and hepsin (3). cDNA microarray studies typically yield many candidate genes that seem to differentiate between disease groups at the RNA level. However, only a small portion of these findings will ultimately be confirmed by immunohistochemistry at the protein level. Immunohistochemical studies of prostate cancer reveal tremendous within-specimen heterogeneity. Not only are normal cells typically present in cancer specimens, but staining among tumor cells can be highly variable. Figure 1 shows a section of formalin-fixed, paraffin-embedded pros- tate tissue stained with the syndecan-1 antibody (4). Both normal and cancer cells are visible; moreover, some areas of the tumor show mild to moderate staining, whereas others show very intense staining. Because of variability, like that illustrated in Fig. 1, specimens are often not coded as positive or negative for a particular antibody; rather, a summary score is calculated (1, 4). First, the percent of cells staining at a fixed number of intensity levels (e.g. none, faint, moderate, strong) is determined. We refer to this vector of percents as the staining profile associated with a specific specimen. Then the score is given as the sum over intensity levels of the percent staining multiplied by an ordinal value corresponding to the intensity level (0 = no staining, 1 = mild staining, etc.). With four intensity levels, the resulting score ranges from 0 (no staining in the entire specimen) to 300 (intense staining uniformly within the specimen). Variants of this system have been proposed; for example, instead of the percent staining at each intensity level, an integer may be recorded with values (0, 1, 2, ...) indicating whether the percent staining at a given level falls within a specified range (0-20%, 20-40%, 40-60%, ..., ref. 5). Qualitative approaches, which classify specimens as positive/negative or weak- strong, have also been used (2, 3). The transformation from staining profile to score is many to one. Consider, for example, the two specimen profiles in Table 1. The first profile shows increased mild to moderate staining relative to the second profile, which has more intense staining. However, the two sample specimens in Table 1 yield the same score (80), because the increased frequency of cells staining moderately in the first sample compensates for the small absolute increase in intense expression in the second sample. Therefore, if differences such as those represented in Table 1 are of interest, a score-based analysis may not be adequate. This article presents an approach for directly analyzing data on the staining profiles recorded in immunohistochemical Received 8/4/04; revised 2/9/05; accepted 3/2/05. Grant support: Grant P50 CA97186. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. Requests for reprints: Ruth Etzioni, Translational and Outcomes Research, Fred Hutchinson Cancer Research Center, Mailstop M2-B230, 1100 Fairview Avenue North, P.O. Box 19024, Seattle, WA 98109. Phone: 206-667-6561; Fax: 206-667-7264. E-mail: retzioni@fhcrc.org Copyright D 2005 American Association for Cancer Research. Cancer Epidemiology, Biomarkers & Prevention 1040 Cancer Epidemiol Biomarkers Prev 2005;14(5). May 2005 Research. on February 25, 2016. © 2005 American Association for Cancer cebp.aacrjournals.org Downloaded from