Towards an identification of prototypical non-native modal constructions in EFL: A corpus-based approach SANDRA C. DESHORS * Abstract At the intersection of corpus, cognitive linguistics and interlanguage (IL), this work models the IL properties of written constructions with may, can, must and will in four varieties of learner English (Chinese-, French-, German- and Swedish-English interlanguage) and British English. Following the usage-based assumption that linguistic knowledge is both item-specific and schematic in nature and based on recent psycholinguistic work showing learners' difficulties to produce native-like constructions (Ellis and Sagarra 2011), this study proposes a multifactorial way of exploring learners’ knowledge of constructions at concrete and schematic levels, simultaneously. Specifically, 1903 constructions are investigated using a comprehensive corpus annotation scheme and multinomial regression modelling. This work reveals different patterns of central tendency of constructions across IL and native English. Crucially, it emerges that although, abstractly, learners and native speakers share similar schematic constructs, more superficially, learners' linguistic representations of those constructs systematically deviate from those of natives. Ultimately, this work sets the scene for experimental research on prototypical non-native modal constructions and raises the question of what constitutes an adequate level of (corpus) description for the profiling of IL grammars. Keywords: interlanguage, construction grammar, English modals, corpus linguistics, multifactorial approach, multinomial logistic regression 1 Background of the study Interlanguage (IL) varieties (i.e., varieties of given languages developed by non-native speakers of those languages) have long been recognized as linguistic systems in their own right. In that regard, Selinker (1969: 71) writes that “the recognition of the existence of an interlanguage cannot be avoided and must be dealt with as a system”. According to Adjemians (1976), a most salient characteristic of IL systems is their permeable nature. Crucially, such a permeability allows learners to transfer grammatical properties from their native language (henceforth L1) and to generalize target language properties in an effort to communicate. Put differently, and as De Bot et al. (2007: 19) more recently note, “languages and accordingly second languages behave like complex, dynamic systems”. The recognition of ILs as linguistic systems has led to the development of a growing body of corpus research concerned with the automated profiling of learner language varieties and the identification of linguistic patterns characteristic of IL grammars. A main challenge for learner corpus researchers has thus been to develop methodological approaches that capture traces of non-nativeness in learners’ speech (in the broad sense of the term) based on learners’ deviations from a native norm chosen for comparison. Ultimately, such methodological approaches help us further our understanding of the grammatical mechanisms involved in second language (henceforth L2) production. So far, a particularly fruitful way of achieving that goal has involved contrasting distributional patterns of formal linguistic items across native and learner language. For instance, analysts have compared the frequencies of occurrence of linguistic items in terms of their over- and -under uses in learner language 1