Spanish Treebank Annotation of Informal Non-Standard Web Text ⋆ Mariona Taul´ e 2 , M. Antonia Mart´ ı 2 , Ann Bies 1 , Montserrat Nofre 2 ,Aina Gar´ ı 2 Zhiyi Song 1 , Stephanie Strassel 1 and Joe Ellis 1 Linguistic Data Consortium 1 , University of Pennsylvania, 3600 Market Street, Suite 801, Philadelphia, PA, 19104, USA CLiC 2 University of Barcelona, Gran Via 588, 08007 Barcelona, Spain Abstract. This paper presents the Latin American Spanish Discussion Forum Treebank (LAS-DisFo). This corpus consists of 50,291 words and 2,846 sen- tences that are part-of-speech tagged, lemmatized and syntactically annotated with constituents and functions. We describe how it was built and the method- ology followed for its annotation, the annotation scheme and criteria applied for dealing with the most problematic phenomena commonly encountered in this kind of informal unedited web text. This is the first available Latin American Spanish corpus of non-standard language that has been morphologically and syn- tactically annotated. It is a valuable linguistic resource that can be used for the training and evaluation of parsers and PoS taggers. 1 Introduction In this article we present the problems found and the solutions adopted in the process of the tokenization, part-of-speech (PoS) tagging and syntactic annotation of the Latin American Spanish Discussion Forum Treebank (LAS-DisFo). 1 This corpus consists of a compilation of textual posts and includes suggestions, ideas, opinions and questions on several topics including politics and technology. Like chats, tweets, blogs and SMS these texts constitute a new genre that is charac- terized by an informal, non-standard style of writing, which shares many features with spoken colloquial communication: the writing is spontaneous, performed quickly and usually unedited. At the same time, to recover the lack of face-to-face interactions, the texts contain pragmatic information about mood and feelings often expressed by para- textual clues: emoticons, capital letters and non-conventional spacing, among others. As ⋆ This material is based on research sponsored by Air Force Research Laboratory and Defense Advanced Research Projects Agency under agreement number FA8750-13-2-0045. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of Air Force Research Laboratory and Defense Advanced Research Projects Agency or the U.S. Government. 1 A Discussion Forum is an online asynchronous discussion board where people can hold con- versations in the form of posted messages.