Grammar Induction Profits from Representative Stimulus Sampling Fenna H. Poletiek (poletiek@fsw.leidenuniv.nl) Department of Psychology, Leiden University, pobox 9555 Leiden, The Netherlands Nick Chater (n.chater@ucl.ac.uk) Department of Psychology, University College London, Gower Street, London, WC1E6BT, UK Abstract Sensitivity to distributional characteristics of sequential linguistic and nonlinguistic stimuli, have been shown to play a role in learning the underlying structure of these stimuli. A growing body of experimental and computational research with (artificial) grammars suggests that learners are sensitive to various distributional characteristics of their environment (Kuhl, 2004; Onnis, Monaghan, Richmond & Chater, 2005; Rohde & Plaut, 1999). We propose that, at a higher level, statistical characteristics of the full sample of stimuli on which learning is based, also affects learning. We provide a statistical model that accounts for such an effect, and experimental data with the Artificial Grammar Learning (AGL) methodology, showing that learners also are sensitive to distributional characteristics of a full sample of exemplars. Keywords: Artificial grammar learning; statistical learning; frequency distribution Introduction People seem naturally sensitive to structural characteristics of their environment, and they are able to use this knowledge adaptively. Such learning occasionally occurs implicitly and without instruction. For example, learning motor patterns like tying shoe laces and riding a bicycle, and several aspects of social behavior involve implicit associative learning. However, learning the rules of language by children is probably the most striking example of acquiring structure knowledge without apparent explicit awareness. Though it is currently debated to what extent an innate predisposition is responsible for this achievement or a general inductive learning capability, a growing number of studies suggests that humans have a powerful and adaptive sensitivity to the statistical properties of environmental stimuli (Kuhl, 2004; Gomez & Gerken, 2000; Redington, Chater & Finch, 1998). In particular, studies on implicit sequence learning have revealed that statistical patterns can be picked up and used in subsequent usage of the system. For example, distributional characteristics in linguistic materials have been suggested to support syntactical category learning (Onnis, Monaghan, Richmond, & Chater, 2005; Mintz, 2002). Infant studies suggest that babies segment a stream of sounds (artificial words and syllables), on the basis of statistical associative properties of these sequences (Saffran, Aslin & Newport, 1996). Recently, another kind of regularity in sequential information has been proposed to affect segmentation: i.e. differences in variability between adjacent elements. For example, the word ‘walking’ consists of the highly variable part: (walk) and the invariant part ‘ing’. This difference in variability has been proposed to serve as a cue for finding the boarders of linguistic units such as words (Gomez, 2002; Monaghan, Onnis, Christiansen & Chater, submitted). Finally, semantic regularities and associations are proposed to play a role in learning grammatical regularities (Rohde & Plaut, 1999): some words are much more associated than others, for semantic reasons. E.g., he walks is more frequent than he city’, suggesting permissible and impermissible word (category) order. Besides the statistical characteristics regarding local transitions in a structured sequence, statistical characteristics of the full stimulus sample of exemplars with which a learner is trained, may be informative for grammar induction as well (Poletiek, 2006). Consider a grammar G producing exemplars varying in length. A random output of such a grammar would be a sample containing short and long exemplars exemplifying all kinds of sequential rules specified by the grammar, but not all rules in equal number. Indeed, such output would contain more exemplars exemplifying typical and highly frequent rules in the grammar, (for example, highly associated adjacent or non adjacent elements) than exemplars with exceptional rules (Chater & Vitányi, submitted). Also, in a general case, short exemplars would occur more often than long ones. The resulting frequency distribution of this random output sample of G, may provide information to a learner for inducing G. Thus, in addition to the distributional features (e.g., sounds, syllables and words in natural language) within exemplars, the learner may benefit from distributional characteristics between exemplars of a full input sample (Poletiek, 2006). In natural language, these high level distributional characteristics of the linguistic input are obvious. Some grammatical constructions are much more frequent than others. For example, sentences with three (levels of) self embedded relative clauses are rare as compared to sentences 1968