A final draft of: Archer, Dawn and Jonathan Culpeper (2003) „Sociopragmatic annotation: New directions and possibilities in historical corpus linguistics‟. In: A. Wilson, P. Rayson and A. M. McEnery (eds.) Corpus Linguistics by the Lune: A Festschrift for Geoffrey Leech, Peter Lang: Frankfurt/Main, 37-58. Note it may contain minor errors and infelicities. 1 Sociopragmatic annotation: New directions and possibilities in historical corpus linguistics Dawn Archer and Jonathan Culpeper Lancaster University 1 1. Introduction Perhaps the most striking fact about Geoffrey Leech is his almost uncanny ability to move from field to field, undertaking definitive work in each. The general aim of this paper is to show how the corpus approach can be used in pragmatics research. It is fitting that this paper brings together two fields in which Geoffrey Leech's work is especially prominent. Moreover, this particular interface represents one of his current research interests. This paper will describe how the fields our work straddles – pragmatics, historical linguistics, sociolinguistics, and corpus linguistics – each have their own research goals and methodological preferences and problems, which, when combined, present our work with a particular set of difficulties. At the heart of these difficulties lies the issue of context. Our aim in this paper is to demonstrate how a sophisticated annotation scheme can help bridge the gap between text and contexts, and thus further research in sociopragmatics, specifically, historical sociopragmatics. Our data covers a 120 year time span (1640-1760), and consists of more than 240,000 words drawn from two text-types: trial proceedings and drama. The fact that it is historical data produces a further set of problems regarding the reconstruction of the historical social context, and we will also address these problems in the course of this paper. We will describe how the annotation system that we have developed:  accommodates the investigation of language set in various context(s) (for example, speaker/hearer relationships, social roles, and sociological characteristics such as gender), and  treats contexts as dynamic (cf. other annotation systems, such as the spoken sub-section of the BNC, which concentrate upon the relatively static characteristics of speakers). We begin by commenting on how our work relates to pragmatics, historical linguistics, sociolinguistics and corpus linguistics. We then briefly describe our data, before moving onto the major part of this paper: a description of our annotation scheme, and the particular problems we encountered in implementing it. 2. Fields and interfaces: Research aims and methodologies Our work interfaces with four fields: pragmatics, sociolinguistics, historical linguistics and corpus linguistics. We first offer a brief sketch of some of the traditional research aims and methodological preferences of pragmatics, historical linguistics, and sociolinguistics. Then we consider relatively recent developments that have taken place where these fields interface with corpus linguistics. Needless to say, our aim is not to be comprehensive here, but, rather, to give the flavour of similarities and differences. 2.1 Fields: Pragmatics, historical linguistics, and sociolinguistics A particular feature of pragmatics research is its concern with language use in context. All the major theories in pragmatics capture some aspect of context. Most pragmatic studies have analysed spoken interaction, and have focussed, directly or indirectly, on „real‟ language data, whether that data be elicited (e.g. through discourse completion tasks) or naturally occurring (e.g. recorded classroom interaction). 2 Studies involving large quantities of data are not unusual. For example, the Cross Cultural Speech Act Realization Patterns (CCSARP) project (see Applied Linguistics Vol. 5, No. 3, and Blum-Kulka 1989) is a study of data elicited by questionnaire, involving seven different languages or language varieties and 1088 informants. However, leafing through the papers in the Journal of Pragmatics swiftly reveals the that the majority of studies contain small-scale qualitative analyses or theoretical descriptions. 3 1 Our work represents one strand within a much larger project - the „Corpus of English Dialogues, 1560-1760‟ project, which aims at the exploration of Early Modern English dialogues, through the corpus linguistics methodology. This is an international project involving two teams: at Uppsala (Sweden) Merja Kytö, Terry Walker and Mattias Jacobsson, and at Lancaster (U.K.) the two authors of this paper and Michi Shiina. Needless to say, our work here has benefited from the assistance of these project members. 2 Work at the pragmatics-cognition interface tends to be an exception. Relevance theory (Sperber and Wilson 1992), for example, contents itself with constructed examples. 3 A informal assessment of the first 25 papers published in 2001 in the Journal of Pragmatics revealed that only about seven both had a sufficient quantity of data for frequency work and actually presented frequency work (some papers simply discussed excerpts from a larger body of data). Of course, what counts as a 'large' body of data is a relative matter. Nevertheless, data sets were typically much smaller than those used in corpus linguistics (e.g. one pragmatic study used a corpus of 19,922 words; another used 29 texts, amounting to 271 turns).