Modeling Spatial Variation in Leukemia Survival Data Robin Henderson, Silvia Shimakura, and David Gorst In this article we combine ideas from spatial statistics with lifetime data analysis techniques to investigate possible spatial variation in survival of adult acute myeloid leukemia patients in northwest England. Exploratory analysis suggests both clinically and statistically signicant variation in survival rates across the region. A multivariate gamma frailty model incorporating spatial dependence is proposed and applied, with results conrming the dependence of hazard on location. KEY WORDS: Cancer; Frailty; Geostatistics; Hierarchic model; Latent process; Semiparametric model. 1. INTRODUCTION Although leukemia survival rates continue to improve as more effective therapies are introduced, considerable between- patient heterogeneity remains conditional on treatment and known prognostic factors (see, e.g., Cassileth et al. 1992; Estey, Shen, and Thall 2001; Schoch et al. 2001). In this arti- cle we investigate whether at least part of this heterogeneity might be linked to spatial effects, using data maintained by the North West Leukemia Register in the United Kingdom. This is a high-quality database that holds records of incidence and subsequent survival status of all leukemia cases in northwest England. In a previous informal study, Gorst (1995) suggested that there could be district-to-district variation in survival rates above and beyond what might be expected to occur by chance alone. Such a nding, if substantiated, would be of consid- erable interest. It could be due to patient management differ- ences between treatment centers, which could have an impor- tant inuence on future clinical practice, or due to background variation in population or environmental characteristics, neces- sitating further epidemiologic study. We investigate whether the survival distribution for acute myeloid leukemia (AML) in adults is homogeneous across the region after allowing for known risk factors. We use regis- ter data on the 1,043 cases recorded between 1982 and 1998. AML represents the biggest single category of adult leukemia in the register. Figure 1 shows residential locations of the AML cases in the study period, together with the 24 admin- istrative districts that make up the region. The boxed area is 100 km 120 km, and the numbers are district identiers, used for later reference. Apparent clustering is of course due in large part to the population distribution. In this work we do not discuss the detection and modeling of spatial variation in disease incidence , for which there are now well-established methods (Elliott, Wakeeld, Best, and Briggs 2000). Instead, we concentrate on subsequent survival by extending standard survival models to the spatial setting. The simple cloropeth map in Figure 2 of estimated relative risks between districts, Robin Henderson is Reader, Department of Mathematics and Statistics, Lancaster University, Lancaster LA1 4YF, U.K. (E-mail: Robin.Henderson@ lancaster.ac.uk). Silvia Shimakura is Lecturer, Departamento de Estatística, Universidade Federal do Paraná, Caixa Postal 19081, 81531-990, Curitiba, PR, Brazil (E-mail: Silvia.Shimakura@est.ufpr .br). David Gorst is Consultant Haematologist, Department of Haematology, Royal Lancaster Inrmary, Lan- caster LA1 4RP, U.K. This work is supported in part by the CAPES Founda- tion, Brazil. The authors thank the editor and three reviewers for constructive comment on earlier versions of this article. Silvia Shimakura was sponsored by CAPES grant BEX1139/96-7. explained more fully later, suggests substantial variability between districts and also some apparent clustering of districts with similar risks. There seems to be a region of high risk run- ning from northeast to southwest, with a low-risk region to the west. To investigate, we adopt a multivariate frailty approach that incorporates the effects of known covariates, individual heterogeneity, and spatial traits. Our ultimate goal is to model possible residual spatial variation in survival after accounting for known subject-specic prognostic factors and unobserved individual heterogeneity. The article is organized as follows. In Section 2 we summa- rize an initial survival analysis of the AML data using stan- dard univariate methods based on a Cox model with and with- out frailty. In Section 3 we investigate possible variation in survival across the region after allowing for covariate effects, using a lattice structure based on the 24 districts. We use a Bayesian hierarchic multivariate gamma model and Markov Chain Monte Carlo (MCMC) methodology for estimation, and the deviance information criterion (DIC) (Spiegelhalter et al. 2002) to compare competing models. In Section 4 we take an alternative approach, using the exact locations of the sub- jects’ residences rather than knowledge only of their district, using an additive gamma frailty model that allows a propor- tion of the total frailty to be explained by a spatially varying component. We provide closing remarks and conclusions in Section 5. 2. INITIAL SURVIVAL ANALYSIS To set the scene, we begin with a standard survival analy- sis ignoring any spatial variation across the region. Data con- sist of observation times t , death/censoring indicators Ä, and covariates x for 1,043 patients. Median survival time was just over 6 months, though some patients survived for more than 13 years. Some 16% of cases were censored. Complete infor- mation is available for four covariates: age; sex (0 D F, 1 D M); white blood cell count (WBC) at diagnosis, truncated at 500 units with 1 unit D 50 10 9 /L; and a measure of deprivation for the enumeration district of residence. For this, we use the Townsend score, which is a quantitative measure with a range of ƒ 7 to 10 in the AML data, higher values indicating less afuent areas (Townsend, Phillimore, and Beattie 1988). The Townsend score is available for each of the 8,131 enumeration © 2002 American Statistical Association Journal of the American Statistical Association December 2002, Vol. 97, No. 460, Application and Case Studies DOI 10.1198/016214502388618753 965