A Constructivist Approach to Robot Language Learning via
Simulated Babbling and Holophrase Extraction
Joe Saunders, Caroline Lyon, Frank F¨ orster, Chrystopher L. Nehaniv and Kerstin Dautenhahn
Abstract— It is thought that meaning may be grounded in
early childhood language learning via the physical and social
interaction of the infant with those around him or her, and
that the capacity to use words, phrases and their meaning are
acquired through shared referential ‘inference’ in pragmatic
interactions. In order to create appropriate conditions for
language learning by a humanoid robot, it would therefore be
necessary to expose the robot to similar physical and social
contexts. However in the early stages of language learning
it is estimated that a 2-year-old child can be exposed to
as many as 7,000 utterances per day in varied contextual
situations. In this paper we report on the issues behind and the
design of our currently ongoing and forthcoming experiments
aimed to allow a robot to carry out language learning in a
manner analogous to that in early child development and which
effectively ‘short cuts’ holophrase learning. Two approaches are
used: (1) simulated babbling through mechanisms which will
yield basic word or holophrase structures and (2) a scenario for
interaction between a human and the humanoid robot where
shared ‘intentional’ referencing and the associations between
physical, visual and speech modalities can be experienced by the
robot. The output of these experiments, combined to yield word
or holophrase structures grounded in the robot’s own actions
and modalities, would provide scaffolding for further proto-
grammatical usage-based learning. This requires interaction
with the physical and social environment involving human
feedback to bootstrap developing linguistic competencies. These
structures would then form the basis for further studies on
language acquisition, including the emergence of negation and
more complex grammar.
I. I NTRODUCTION
In learning to use language to communicate and manip-
ulate the world around them, human children benefit from
a positive feedback loop involving individual learning (by
interacting with their hands and bodies with objects around
them), social learning (via close interaction with parents and
others), and gradual acquisition of linguistic competencies.
This feedback cycle supports the scaffolding of increasingly
complex skill learning and linguistic development giving the
child ever greater mastery of its social and physical environ-
ment, as well as supporting the development of cognitive and
conceptual capabilities that would seem impossible without
language. Our work is aimed at realizing this same kind of
feedback cycle supporting that scaffolding of behavioural,
linguistic and conceptual competencies in robots. The pur-
poses of doing this are not only to better understand possible
The authors are with the Adaptive Systems Research Group, Centre for
Computer Science and Informatics Research, University of Hertfordshire,
College Lane, Hatfield, Herts AL10 9AB, United Kingdom (email:
{J.1.Saunders,C.M.Lyon,C.L.Nehaniv,K.Dautenhahn,F.Forster}@herts.ac.uk).
The work described in this paper was conducted within the EU Integrated
Project ITalk (“Integration and Transfer of Action and Language in Robots”)
funded by the European Commission under contract number FP7-214668.
mechanisms for such learning in humans, but also to achieve
similar competencies in artificial agents and robots (even
if they are not acquired by exactly the same routes). In
this paper we report on our currently ongoing and proposed
forthcoming experiments which employ the ideas above.
This work is inspired by the observed progress in language
acquisition by human infants. Though we do not aim to
simulate this development as a whole, we investigate certain
mechanisms that could play a key role. We make the artificial
assumption that we can isolate different developmental paths
and examine them separately. Specifically, in our experi-
mental scenarios, we initially model the acquisition of the
phonetic form of words and holophrases without meaning,
by employing simple learning mechanisms. Functional use
and ‘referential understanding’ of utterances (e.g. registration
of sensorimotor and environmental regularities), in a human-
robot interaction context where joint-attentional framing and
simple actions with objects are possible, will be introduced
in subsequent steps as detailed below. Meanwhile, lexi-
cal/holophrastic learning continues and serves to bootstrap
(1) learning of sensorimotor and interactional grounding of
speech and behavioural skills, and (2) learning of the usage
of learnt linguistic structures for the interacting robot to
generate utterances that serve to manipulate its physical and
social environment. The emergence of the capacity to use
various forms of linguistic negation is also targeted in this
experimental setting.
II. FROM BABBLING TO THE ACQUISITION OF WORDS
AND PHRASES WITHOUT MEANING
Initially, the infant’s ability to perceive and analyse acous-
tic signals is much greater than the ability to produce them.
This contrasts with the mature speaker, who has matching
perceptive and productive competencies, linked in mirror
neuron processors. Cognitive capacity precedes productive
abilities in speech. For instance, infants can use function
words to help segment and analyse utterances before they
produce them themselves [1, p. 201]. In an analogous way
our Linguistically Enabled Synthetic Agent (LESA) will
have an asymmetrical language competence. We aim for it
to take in natural English and to respond with appropriate
actions and spoken comments. However, its spoken output
will be limited. The development of the ability to segment
a speech stream into words and phrases overlaps with the
acquisition of semantic understanding and the mastery of
primary language structure, but here we initially just in-
vestigate the emergence of stable phonetic forms or strings
independent of meaning. Our simulation aims to show how a
978-1-4244-2763-5/09/$25.00 ©2009 IEEE
Authorized licensed use limited to: UNIVERSITY OF HERTFORDSHIRE. Downloaded on May 19, 2009 at 09:48 from IEEE Xplore. Restrictions apply.