A simulated annealing strategy for the detection of arbitrarily shaped spatial clusters LUIZ DUCZMAL * and RENATO ASSUNÇÃO Department of Statistics – ICEx – UFMG Laboratório de Estatística Espacial (LESTE) and Centro de Estudos de Criminalidade e Segurança Pública (CRISP) 30161-970 – Belo Horizonte – MG, Brazil ABSTRACT We propose a new graph based strategy for the detection of spatial clusters of arbitrary geometric form in a map of geo-referenced populations and cases. Our test statistic is based on the likelihood ratio test previously formulated by Kulldorff and Nagarwalla for circular clusters. A new technique of adaptive simulated annealing is developed, focused on the problem of finding the local maxima of a certain likelihood function over the space of the connected subgraphs of the graph associated to the regions of interest. Given a map with n regions, on average this algorithm finds a quasi-optimal solution after analyzing s n log(n) subgraphs, where s depends on the cases density uniformity in the map. The algorithm is applied to a study of homicide clusters detection in a Brazilian large metropolitan area. KEYWORDS: Spatial cluster detection, simulated annealing, likelihood ratio test, disease clusters, hot-spot detection. 1. INTRODUCTION Since the 1980s there has been an increasing interest in the identification of spatially localized adverse health risk conditions. The reasons for the existence of such clustering are various. They can be due to environmental causes concentrated on small regions such as a localized pollution sources (Biggeri et al., 1996; Katsouyanni et al, 1991; Xu et al., 1989). Another possible reason is population differences on their genetic constituency or social habits such as diet (Barbujani and Sokal, 1990; Walsh and DeChello, 2001). Other possibilities include differences on regional medical services such as ascertainment of new cases or disease treatment protocols (Karjalainen, 1990; Goodwin et al., 1998) or a viral agent generating clustering patterns (Kinlen, 1995). A number of methods have been proposed to test for the presence of spatial clusters of elevated risk and to identify their locations. Thorough and recent reviews can be found in (Lawson et al., 1999) where the many different methods are compared. The methods assume that we have at our disposal a map of regions, each one with a defined risk population and a certain number of observed cases. The cases correspond to the individuals in each population that have a special designation, such as an infected individual or a crime * Corresponding author e-mail: duczmal@est.ufmg.br Luiz Duczmal UFMG – Departamento de Estatística Caixa Postal 702 Belo Horizonte, MG, 30161-970 Brazil FAX: 55-31-3499-5924