Journal of Data Science 10(2012), 579-595 A Model for Spatially Disaggregated Trends and Forecasts of Diabetes Prevalence Peter Congdon Queen Mary University of London Abstract: A multilevel model (allowing for individual risk factors and geo- graphic context) is developed for jointly modelling cross-sectional differences in diabetes prevalence and trends in prevalence, and then adapted to provide geographically disaggregated diabetes prevalence forecasts. This involves a weighted binomial regression applied to US data from the Behavioral Risk Factor Surveillance System (BRFSS) survey, specifically totals of diagnosed diabetes cases, and populations at risk. Both cases and populations are dis- aggregated according to survey year (2000 to 2010), individual risk factors (e.g., age, education), and contextual risk factors, namely US census divi- sion and the poverty level of the county of residence. The model includes a linear growth path in decadal time units, and forecasts are obtained by ex- tending the growth path to future years. The trend component of the model controls for interacting influences (individual and contextual) on changing prevalence. Prevalence growth is found to be highest among younger adults, among males, and among those with high school education. There are also regional shifts, with a widening of the US “diabetes belt”. Key words: Context, diabetes, forecasts, prevalence, risk factor. 1. Introduction A number of nationwide forecasts of diabetes prevalence in the US have been produced, and predict a continued rise in prevalence, related to factors such as rising obesity and differential growth in minority groups more prone to the condition (Huang et al., 2009; Boyle et al., 2010). However, there are wide variations in diabetes prevalence between different parts of the US (e.g., Barker et al., 2011; Ford et al., 2005), and geographically disaggregated forecasts are important for planning public health interventions. This paper describes a method for analyzing recent geographic trends in preva- lence using health survey data, and for projecting those trends into the future. The model used includes parameters to represent the impact on cross-sectional