A Study on the Use of a Voice Interactive System for Teaching English to Italian
Children
Diego Giuliani, Ornella Mich and Marianna Nardon
ITC - irst Centro per la Ricerca Scientifica e Tecnologica – Trento, ITALY
{giuliani,mich,nardon}@itc.it
Abstract
In this paper, we report on a study whose objectives
were first to investigate the interaction of children with
a multimedia system designed for learning a foreign
language and featuring automatic speech recognition
functionality. Second, we investigated if, using such a
system, users meet some particular difficulties in dealing
with the microphone or in solving tasks which implies
voice interaction. The study was based on the Italian
version of “Tell me More. Kids: The City”, a multimedia
system dedicated to children aged 8 to 10. The results of
the study indicate that, when learners were involved in
tasks where they could freely decide if and how many
times to use the vocal interaction modality, they did not
hesitate to use this modality. The study also revealed the
importance of the form of feedback chosen to report the
system’s judgment on the learner’s pronunciation to her.
1. Introduction
In recent years, due to the rapid advances of speech
recognition technology, a number of systems for
computer assisted language learning have been proposed
that support interactive speaking practice [1,2,3]. Some of
them are explicitly conceived for children [4,5,6].
In this work we investigate the interaction of
children with a multimedia system designed for
learning a foreign language that features automatic
speech recognition functionality, in particular
pronunciation assessment capability. Moreover, we
investigated if children meet some particular difficulties
dealing with the microphone or in solving tasks
which entails voice interaction with computer.
The study was based on the use of Italian version of
Tell Me More. Kids: the City, by Auralog [5].
2. Experimental environment
Two systems were initially considered: Happy English
by Editori Riuniti [4], based on the IBM VIA Voice
speech recognizer, and Tell me More Kids: the City by
Auralog [5], based on proprietary speech recognition
technology. Three teachers tried both the systems.
Considering their observations and our experience with
both the systems, we decided to use the Auralog’s one.
In order to have homogeneous experimental results,
the children had to consider the same four tasks among
those available, and to carry them out in an order
defined randomly for each child. The four considered
tasks were: task 1 - Library, the user can choose among
12 words, which ones to listen and/or to utter, having
feedback about the goodness of her word pronunciation.
Three types of feedback are proposed: the waveform of
the uttered word is displayed, for comparison purposes,
together with the waveform of the reference utterance of
the system; the expression (sad or happy) and the colour
(green or red) of an animated character, a parrot, change
according to the degree of goodness of the learner’s
pronunciation; the expression (more or less smiling) and
the position (craning his neck) of a clown change
according to the system’s judgment; task 2 - Hidden
Words whose objective is to find 8 words hidden in a
square full of characters (10 x 10) and repeat their
pronunciation. For each word, the user had text, a picture
and audio; task 3 - Story, whose objective is to re-order
four pictures representing a story and then repeat the
sentence associated with each picture; task 4 - Color
whose objective is to color a picture, uttering the
appropriate color for each one of the 12 parts.
25 subjects, 15 females and 10 males, aged from 8 to
12 (average 10.3), participated in the experiment.
Children were classified according to their English
expertise: 15 beginners, who have studied English for 3,
4, or 5 years, and 10 starters, with 0, 1, or 2 years English
study.
The experiment followed a 4 (task) x 2 (English
expertise) x 4 (task order) mixed design (both between-
subject and within-subject factors).
3. The study results
Because the Library task is completely different in
objectives and modality of execution from the other three,
we decided to analyze it separately. Regarding the relation
between the number of listening per word and the English
Proceedings of the The 3rd IEEE International Conference on Advanced Learning Technologies (ICALT’03)
0-7695-1967-9/03 $17.00 © 2003 IEEE