A Constructivist Approach to Robot Language Learning via Simulated Babbling and Holophrase Extraction Joe Saunders, Caroline Lyon, Frank F¨ orster, Chrystopher L. Nehaniv and Kerstin Dautenhahn Abstract— It is thought that meaning may be grounded in early childhood language learning via the physical and social interaction of the infant with those around him or her, and that the capacity to use words, phrases and their meaning are acquired through shared referential ‘inference’ in pragmatic interactions. In order to create appropriate conditions for language learning by a humanoid robot, it would therefore be necessary to expose the robot to similar physical and social contexts. However in the early stages of language learning it is estimated that a 2-year-old child can be exposed to as many as 7,000 utterances per day in varied contextual situations. In this paper we report on the issues behind and the design of our currently ongoing and forthcoming experiments aimed to allow a robot to carry out language learning in a manner analogous to that in early child development and which effectively ‘short cuts’ holophrase learning. Two approaches are used: (1) simulated babbling through mechanisms which will yield basic word or holophrase structures and (2) a scenario for interaction between a human and the humanoid robot where shared ‘intentional’ referencing and the associations between physical, visual and speech modalities can be experienced by the robot. The output of these experiments, combined to yield word or holophrase structures grounded in the robot’s own actions and modalities, would provide scaffolding for further proto- grammatical usage-based learning. This requires interaction with the physical and social environment involving human feedback to bootstrap developing linguistic competencies. These structures would then form the basis for further studies on language acquisition, including the emergence of negation and more complex grammar. I. I NTRODUCTION In learning to use language to communicate and manip- ulate the world around them, human children benefit from a positive feedback loop involving individual learning (by interacting with their hands and bodies with objects around them), social learning (via close interaction with parents and others), and gradual acquisition of linguistic competencies. This feedback cycle supports the scaffolding of increasingly complex skill learning and linguistic development giving the child ever greater mastery of its social and physical environ- ment, as well as supporting the development of cognitive and conceptual capabilities that would seem impossible without language. Our work is aimed at realizing this same kind of feedback cycle supporting that scaffolding of behavioural, linguistic and conceptual competencies in robots. The pur- poses of doing this are not only to better understand possible The authors are with the Adaptive Systems Research Group, Centre for Computer Science and Informatics Research, University of Hertfordshire, College Lane, Hatfield, Herts AL10 9AB, United Kingdom (email: {J.1.Saunders,C.M.Lyon,C.L.Nehaniv,K.Dautenhahn,F.Forster}@herts.ac.uk). The work described in this paper was conducted within the EU Integrated Project ITalk (“Integration and Transfer of Action and Language in Robots”) funded by the European Commission under contract number FP7-214668. mechanisms for such learning in humans, but also to achieve similar competencies in artificial agents and robots (even if they are not acquired by exactly the same routes). In this paper we report on our currently ongoing and proposed forthcoming experiments which employ the ideas above. This work is inspired by the observed progress in language acquisition by human infants. Though we do not aim to simulate this development as a whole, we investigate certain mechanisms that could play a key role. We make the artificial assumption that we can isolate different developmental paths and examine them separately. Specifically, in our experi- mental scenarios, we initially model the acquisition of the phonetic form of words and holophrases without meaning, by employing simple learning mechanisms. Functional use and ‘referential understanding’ of utterances (e.g. registration of sensorimotor and environmental regularities), in a human- robot interaction context where joint-attentional framing and simple actions with objects are possible, will be introduced in subsequent steps as detailed below. Meanwhile, lexi- cal/holophrastic learning continues and serves to bootstrap (1) learning of sensorimotor and interactional grounding of speech and behavioural skills, and (2) learning of the usage of learnt linguistic structures for the interacting robot to generate utterances that serve to manipulate its physical and social environment. The emergence of the capacity to use various forms of linguistic negation is also targeted in this experimental setting. II. FROM BABBLING TO THE ACQUISITION OF WORDS AND PHRASES WITHOUT MEANING Initially, the infant’s ability to perceive and analyse acous- tic signals is much greater than the ability to produce them. This contrasts with the mature speaker, who has matching perceptive and productive competencies, linked in mirror neuron processors. Cognitive capacity precedes productive abilities in speech. For instance, infants can use function words to help segment and analyse utterances before they produce them themselves [1, p. 201]. In an analogous way our Linguistically Enabled Synthetic Agent (LESA) will have an asymmetrical language competence. We aim for it to take in natural English and to respond with appropriate actions and spoken comments. However, its spoken output will be limited. The development of the ability to segment a speech stream into words and phrases overlaps with the acquisition of semantic understanding and the mastery of primary language structure, but here we initially just in- vestigate the emergence of stable phonetic forms or strings independent of meaning. Our simulation aims to show how a 978-1-4244-2763-5/09/$25.00 ©2009 IEEE Authorized licensed use limited to: UNIVERSITY OF HERTFORDSHIRE. Downloaded on May 19, 2009 at 09:48 from IEEE Xplore. Restrictions apply.