13 (2) 3 ‘Philological computing’ vs. ‘philological outsourcing’ and the compilation of historical corpora: a Late Modern English test case 1 Stefan Dollinger, University of Vienna 1. Historical corpus linguistics: Corpora galore 2 The corpus-based study of the history of English is not only a well- established, but also a highly prosperous field today (e.g. Rissanen et al. 1992, Raumolin-Brunberg 2002). Despite a somewhat belated start after the first machine-readable corpora of PDE were available in the early 1960s, historical corpus linguistics is alive and well, as evidenced by the entries in the Oxford Text Archive or the third edition of the ICAME CD-ROM collection of English corpora, which is currently being compiled. It may therefore be time to review some of the practices of corpus compilation that seem to have become accepted in the field since the ground- breaking Helsinki Corpus was completed in the late 1980s. Since then, a number of historical corpora have been compiled; some of them have already been made publicly available, while others are being worked on. Adapting Kennedy’s periodization (1998) of machine-readable corpora for historical corpus linguistics, we may call the younger ‘post-Helsinki’ corpora second generation corpora. This division is based on the second generation corpora filling temporal, geographical or sociolinguistic gaps that the Helsinki Corpus (HC) did not cover, such as the lack of 18 th and 19 th century data, the geographical limitation to texts from Britain, or the lack of lower class speaker data. 1 I would like to thank Ingrid Tieken-Boon van Ostade for her valuable comments on an earlier version of this paper. Needless to say that all remaining faults are mine alone. Author’s email for correspondence: stefan.dollinger@univie.ac.at 2 This compound is taken from the title of the conference proceedings of ICAME 19 (1998), edited by John M. Kirk.