Googleology as Smart Lexicography: Big Messy Data for Better Regional Labels 1 Stefan Dollinger Gothenburg University and University of British Columbia stefan.dollinger@sprak.gu.se Abstract One of the biggest desiderata in practical lexicography is the labeling of lexemes by region in the widest sense of that word. In a highly mobile world, regional labeling is bound to receive more attention, yet is perhaps the least precise aspect of English dictionaries more generally. Largely comprising two groups— national terms, such as Americanism or Briticism, and regional terms of a certain more local provenance, such as Southwestern Ontario or Scottish—regional labeling in English dictionaries suffers from both a theoretical neglect and a practical lack of adequate data for dictionary editors to use easily in assessing a term’s geographical dimensions. This paper describes a method developed for the second edition of the Dictionary of Canadianisms on Historical Principles (DCHP-2). Using site-restricted web searches in combination with long-term web monitoring, the method rests on a normalization routine that produces “Frequency Indices” that are comparable across domains. Counter to recent lexicographic best practices, it is shown that web-scaled resources—generally preferred by computational linguists and computational lexicographers as 1 I am grateful to two highly knowledgeable and supportive reviewers, who saw the potential of this paper even in its first draft. I further owe thanks to a number of colleagues for sharing references and expertise during a prolonged writing process: Paul Cook, Steve Kleinedler, Michael Hancher, Lisa Berglund, Sidney Landau, Tom Zurinskas, Edward Finegan, Andrew Hawke, Charlotte Brewer, and Stefan Th. Gries. The usual disclaimers apply. Dictionaries: Journal of the Dictionary Society of North America 37 (2016), 60–98