International Journal of Corpus Linguistics 21:3 (2016), 348–371. doi 10.1075/ijcl.21.3.03die issn 1384–6655 / e-issn 1569–9811 © John Benjamins Publishing Company Compiling computer-mediated spoken language corpora Key issues and recommendations Stefan Diemer, Marie-Louise Brunner and Selina Schmidt Saarland University Tis paper discusses key issues in the compilation of spoken language corpora in a computer-mediated communication (CMC) environment, using data from the Corpus of Academic Spoken English (CASE), a corpus of Skype conversa- tions currently being compiled at Saarland University, Germany, in cooperation with European and US partners. Based on frst fndings, Skype is presented as a suitable tool for collecting informal spoken data. In addition, new recommenda- tions concerning data compilation and transcription are put forward to supple- ment existing best practice as presented in Wynne (2005). We recommend the preservation of multimodal features during anonymisation, and the addition of annotation elements already at the transcription stage, particularly CMC-related discourse features, English as a Lingua Franca (ELF) features (e.g. non-standard language and code-switching), as well as the inclusion of prosodic, paralinguis- tic, and non-verbal annotation. Additionally, we propose a layered corpus design in order to allow researchers to focus on specifc annotation features. Keywords: spoken language corpora, data compilation and transcription, Computer-mediated communication (CMC), best practice, Skype 1. Introduction In this paper, we look at key issues related to the compilation of spoken language corpora in a computer-mediated communication (CMC) environment. It has been more than ten years since Tompson (2005) addressed the issue of compil- ing spoken corpora to establish best practice recommendations, and in the years since the guidelines were published, there have been considerable changes both in technology and in the quality of the linguistic data collected. In particular, while much research has been focusing on written CMC, spoken conversations ucl/5 IP: 144.82.108.120 On: Sun, 02 Apr 2017 12:37:14