Environmental Research 101 (2006) 256–262 Comparison of residential geocoding methods in population-based study of air quality and birth defects Suzanne M. Gilboa a,Ã , Pauline Mendola b , Andrew F. Olshan a , Catherine Harness c , Dana Loomis a , Peter H. Langlois d , David A. Savitz a , Amy H. Herring e a Department of Epidemiology, University of North Carolina, Chapel Hill, NC, USA b Human Studies Division, National Health and Environmental Effects Research Laboratory, United States Environmental Protection Agency, Research Triangle Park, NC, USA c Computer Sciences Corp., Durham, NC, USA d Birth Defects Epidemiology and Surveillance Branch, Texas Department of State Health Services, Austin, TX, USA e Department of Biostatistics, University of North Carolina, Chapel Hill, NC, USA Received 5 May 2005; received in revised form 23 December 2005; accepted 9 January 2006 Available online 17 February 2006 Abstract Our population-based case–control study of air quality and birth defects in Texas relied on the geocoding of maternal residence from vital records for the assignment of air pollution exposures during early pregnancy. We attempted to geocode the maternal addresses for 5338 birth defect cases and 4574 frequency-matched controls using an automated procedure with standard matching criteria in ArcGIS 8.2 and 8.3. Initially, we matched 7266 observations (73%). To increase the proportion of successful matches, we used an interactive procedure for the 2646 addresses that were initially not geocoded by the software. This yielded an additional 985 matches (37%). Using the same 2646 initially unmatched addresses, we compared the results of this interactive procedure to those of an automated procedure using lower standards. The automated procedure with lower standards yielded more matches (n ¼ 1559, 59%) but with questionable accuracy. We included the interactively geocoded observations in our final data set. Their inclusion did not affect the estimates of air pollution exposure but increased our statistical power to detect associations between air quality and risk of selected birth defects. The geocoded and not geocoded populations differed in the distribution of Latino ethnicity (51% vs 59%) and ethnicity was independently associated with air pollution exposures ðPo0:05Þ. Geocoding status also appeared to modify the association between ethnicity and risk of birth defects; Latina women appeared to have a slightly lower risk of birth defects than non-Latina women in the geocoded population and to have a slightly higher risk in the not geocoded population. Incomplete geocoding may have resulted in a selection bias because of the underrepresentation of Latinas in our study population. r 2006 Elsevier Inc. All rights reserved. Keywords: Geographic information systems; Bias (epidemiology) 1. Introduction The use of geographic information systems (GIS) is becoming increasingly popular in environmental epide- miology (Nuckols et al., 2004). Geocoding, also called address matching, is one of the many tools available to the researcher in a variety of GIS software applications. It assigns latitude and longitude coordinates to addresses by linking to a reference theme or electronic street map that contains both address and geographic information (Bonner et al., 2003; Cayo and Talbot, 2003; Vine et al., 1997). Matching rates are typically 40–80% using commercially available software (Krieger et al., 2001). Only recently has the public health literature addressed the accuracy of geocoding methods (Bonner et al., 2003; Cayo and Talbot, 2003); a recent study compared the match rates, accuracy, and repeatability of commercial geocoding services and found that all three measures needed to be considered when deciding on a geocoding method (Whitsel et al., 2004). ARTICLE IN PRESS www.elsevier.com/locate/envres 0013-9351/$ - see front matter r 2006 Elsevier Inc. All rights reserved. doi:10.1016/j.envres.2006.01.004 Ã Corresponding author. Fax: +1 404 498 3040. E-mail address: sgilboa@cdc.gov (S.M. Gilboa).