A memory-based account of constructional differences between Netherlandic and Belgian Dutch Antal van den Bosch * , Stefan Grondelaers * , and Dirk Speelman ** * Centre for Language Studies, Radboud University Nijmegen, The Netherlands ** Quantitative Lexicology and Variational Linguistics, KU Leuven, Belgium Abstract. We present a memory-based learning (MBL) model for placing existentialer in Netherlandic and Belgian Dutch locative inversion constructions such as In de asbak ligt (er) een sigarenpeuk â ˘ AIJin the ashtray (there) was a cigar buttâ ˘ A ˙ I. Memory-based learning is a computational algorithm which stores some representation of a training set in memory, and classifies new cases by extrapolation from the most similar stored cases. In this case study, the model is trained either on small sets of manually annotated examples or on a large amount of examples of adjuncts and verbs at the left side of the potential er location, drawn from auto- matically parsed treebanks. We find that while er placement in Netherlandic Dutch is relatively easy to learn, even on the basis of a few hundred examples, learning to place er in Belgian Dutch is much harder, and best done with a large training set of automatically gathered ex- amples. The Netherlandic MBL system rivals the performance of a regression-based approach that uses only high-level features. Cross-variant training and testing reveals that the relatively predictable Netherlandic Dutch data may serve as reasonable training data for er placement in the Belgian Dutch data, but the reverse is not the case, suggesting a relatively noisy process of selecting er in Belgian Dutch, where (local) higher-order considerations play a much larger role than in Netherlandic Dutch. Theoretically, MBL affords access in hitherto unexplored aspects of the lexical aspects of syntactic preference and constructional differences between Belgian and Netherlandic Dutch. 1 Introduction With the increased availability of digitized or born-digital language corpora and com- putational tools to analyse these corpora, studies on syntactic variation are turning to these resources for hypothesis testing. As follows from Zipf’s law, larger corpora contain ever more attestations of ever more word tokens (Zipf 1935), and hence of word n-grams, and constructions. It is not uncommon for corpora to contain hundreds of millions to billions of words nowadays (Oostdijk et al. 2008); corpora harvested from the web may contain one or two orders of magnitude more (Davies 2013–). The fact that even rare phenomena tend to surface in statistically sufficient quantities in these corpora offers a solution to the oft-noted problem of token paucity in (socio-) syntactic research (Milroy and Gordon 2003).