This is the postprint (the non-typeset version reflecting changes made during the peer review process) of a contribution to: Wigham, C. R., & Stemle, E. W. (eds.) (2019). Building Computer-Mediated Communication Corpora for sociolinguistic Analysis. Presses Universitaires Blaise Pascal. The content must not be used in a commercial context. 28 Comparison of Automatic vs. Manual Language Identification in Multilingual Social Media Texts Jennifer-Carmen FREY Eurac Research, Italy Egon W. STEMLE Eurac Research, Italy A.Seza DOĞRUÖZ Independent Researcher Multilingual speakers communicate in more than one language in daily life and on social media. In order to process or investigate multilingual communication, there is a need for language identification. This study compares the performance of human annotators with automatic ways of language identification on a multilingual (mainly German-Italian-English) social media corpus collected in South Tyrol, Italy. Our results indicate that humans and Natural Language Processing (NLP) systems follow their individual techniques to make a decision about multilingual text messages. This results in low agreement when different annotators or NLP systems execute the same task. In general, annotators agree with each other more than NLP systems. However, there is also variation in human agreement depending on the prior establishment of guidelines for the annotation task or not. 1. Introduction 1.1. Languages in South Tyrol South Tyrol (ST, or Alto Adige) is a multilingual province in Northern Italy hosting German and Italian speakers and characterized by territorial and institutional multilingualism (Abel, Vettori & Forer, 2012). Italian, German and Ladin (a group of romance language dialects spoken in Northern Italy) are acknowledged as official languages and used in administrative communication and schooling. At the personal level, residents officially declare their language affiliation to ensure that public funding gets distributed according to the proportion of language groups 10 . Parents may choose the school and the language of instruction for their 10 The latest census (2011) reports a proportional composition for the whole province of 69,6% of the population belonging to the German language group, 25,8% belonging to the Italian language group and 4,5% belonging to the Ladin language group. However, these numbers differ substantially in urban areas like Bozen/Bolzano and