Match Rate and Positional Accuracy of Two Geocoding Methods for Epidemiologic Research F. BENJAMIN ZHAN, PHD, JEAN D. BRENDER, PHD, IONARA DE LIMA, MA, LUCINA SUAREZ, PHD, AND PETER H. LANGLOIS, PHD PURPOSE: This study compares the match rate and positional accuracy of two geocoding methods: the popular geocoding tool in ArcGIS 9.1 and the Centrus GeoCoder for ArcGIS. METHODS: We first geocoded 11,016 Texas addresses in a case–control study using both methods and obtained the match rate of each method. We then randomly selected 200 addresses from those geocoded by using both methods and obtained geographic coordinates of the 200 addresses by using a global position- ing system (GPS) device. Of the 200 addresses, 110 were case maternal residence addresses and 90 were control maternal residence addresses. These GPS-surveyed coordinates were used as the ‘‘true’’ coordinates to calculate positional errors of geocoded locations. We used Wilcoxon signed rank test to evaluate whether differences in positional errors from the two methods were statistically significantly different from zero. In addition, we calculated the sensitivity and specificity of the two methods for classifying maternal addresses within 1500 m of toxic release inventory facilities when distance is used as a proxy of exposure. RESULTS: The match rate of the Centrus GeoCoder was more than 10% greater than that of the geo- coding tool in ArcGIS 9.1. Positional errors with the Centrus GeoCoder were less than those of the geo- coding tool in ArcGIS 9.1, and this difference was statistically significant. Sensitivity and specificity of the two methods are similar. CONCLUSIONS: Centrus GeoCoder for ArcGIS for geocoding gives greater match rates than the geo- coding tool in ArcGIS 9.1. Although the Centrus GeoCoder has better positional accuracy, both methods give similar results in classifying maternal addresses within 1500 m of toxic release inventory facilities when distance is used as a proxy of exposure. Ann Epidemiol 2006;16:842–849. Ó 2006 Elsevier Inc. All rights reserved. KEY WORDS: Global positioning system, Geocoding, Address Matching, Epidemiology. INTRODUCTION Geocoding, a process of assigning geographic coordinates (e.g., latitude and longitude) to locations based on street addresses, has been used increasingly in epidemiologic re- search to examine the relation between potential environ- mental exposures and health effects. The validity of epidemiologic research depends on the match rate of geo- coding (the percentage of addresses geocoded), as well as the positional accuracy of locations of geocoded addresses. In this study, we define positional accuracy as the difference between the geographic location of a geocoded address and the ‘‘true’’ ground location of that address determined by using a field survey method, i.e., surveying using a global positioning system (GPS) device. Despite its long history, geocoding remains a labor-inten- sive and time-consuming process in health-related research. In recent years, a number of researchers published their find- ings about either the match rate or positional accuracy of different geocoding methods (1–8); none addressed both. As a consequence, epidemiologists have very limited infor- mation to rely on when they face the situation of selecting the best geocoding tool for their research. The purpose of this study is to compare the match rate and positional accuracy of a popular geocoding tool avail- able in ArcGIS 9.1 from Environmental Systems Research Institute (ESRI; Redlands, CA) with another geocoding From the Department of Geography, Texas Center for Geographic Infor- mation Science, Texas State University, San Marcos, TX (F.B.Z., I.D,L,); College of Resources and Environmental Science, Wuhan University, Wu- han, China (F.B.Z.); Department of Epidemiology and Biostatistics, Texas A&M School of Rural Public Health, College Station, TX (J.D.B.); and Ep- idemiology and Disease Surveillance Unit (L.S.) and Texas Center for Birth Defects Research and Prevention (P.H.L.), Texas Department of State Health Services, Austin, TX. Address correspondence to: F. Benjamin Zhan, Texas Center for Geo- graphic Information Science (TxGISci), Department of Geography, Texas State University, San Marcos, TX 78666. Tel.: (512) 245-8846; fax: (512) 245-8353. E-mail: zhan@txstate.edu. This study was supported in part by cooperative agreement U50/CCU613232 from the Centers for Disease Control and Prevention and contract 7547547549 from the Texas Department of State Health Ser- vices Center for Birth Defects Research and Prevention. Any mention of a software package or a product does not constitute an endorsement of any product by Texas State University-San Marcos, Texas A&M Health Science Center, the Texas Department of State Health Ser- vices, or Wuhan University. The authors have no commercial or any other interests in the companies or software packages mentioned in the article. Received January 10, 2006; accepted June 21, 2006. Ó 2006 Elsevier Inc. All rights reserved. 1047-2797/06/$–see front matter 360 Park Avenue South, New York, NY 10010 doi:10.1016/j.annepidem.2006.08.001