author’s accepted manuscript Original article: B. Jurish, “Diachronic Collocations, Genre, and DiaCollo.” In R. J. Whitt (ed.), Diachronic Corpora, Genre, and Language Change, John Benjamins, Amsterdam, 2018. DOI 10.1075/scl.85.03jur This article is subject to copyright restrictions. The publisher should be contacted for permission to re-use or reprint the material in any form. Diachronic Collocations, Genre, and DiaCollo — REVISED DRAFT — Bryan Jurish Berlin-Brandenburgische Akademie der Wissenschaften jurish@bbaw.de Abstract This chapter presents the formal basis for diachronic collocation profiling as imple- mented in the open-source software tool “DiaCollo” and sketches some potential applica- tions to multi-genre diachronic corpora. Explicitly developed for the efficient extraction, comparison, and interactive visualization of collocations from a diachronic text corpus, DiaCollo is suitable for processing collocation pairs whose association strength depends on extralinguistic features such as the date of occurrence or text genre. By tracking changes in a word’s typical collocates over time, DiaCollo can help to provide a clearer picture of diachronic changes in the word’s usage, especially those related to semantic shift or dis- course environment. Use of the flexible DDC search engine 1 back-end allows user queries to make explicit reference to genre and other document-level metadata, thus allowing e.g. independent genre-local profiles or cross-genre comparisons. In addition to tradi- tional static tabular display formats, a web-service plugin also offers a number of intuitive interactive online visualizations for diachronic profile data for immediate inspection. 1 Introduction DiaCollo is an open-source software tool for automatic collocation profiling (Church and Hanks 1990; Evert 2005) in diachronic corpora such as the Deutsches Textarchiv 2 (Geyken 2013) or the Corpus of Historical American English 3 (Davies 2012) which allows users to choose the projected collocate attributes and the granularity of the diachronic axis on a per-query basis (Jurish 2015; Jurish et al. 2016). Unlike conventional collocation extractors such as DWDS Wortprofil (Didakowski and Geyken 2013) or Sketch Engine (Kilgarriff and Tugwell 2002), DiaCollo is suitable for extraction and analysis of diachronic collocation data, i.e. collocation pairs whose association strength depends on the date of their occurrence and/or other extralinguistic features such as author or genre. By tracking changes in a word’s typical collocates over time or corpus subset and applying J. R. Firth’s famous principle that “you shall know a word by the company it keeps” (Firth 1957), DiaCollo can help to provide a clearer picture of associated changes in the word’s usage. Developed in the context of the European Union CLARIN project 4 to aid historians in their analysis of the changes in discourse topics associated with selected terms as manifested by changes in those terms’ context distributions, DiaCollo has been successfully applied to 1 “DWDS/Dialing Concordance”, http://sourceforge.net/projects/ddc-concordance 2 http://www.deutschestextarchiv.de 3 http://corpus.byu.edu/coha 4 http://www.clarin.eu 1