A Study on the Use of a Voice Interactive System for Teaching English to Italian Children Diego Giuliani, Ornella Mich and Marianna Nardon ITC - irst Centro per la Ricerca Scientifica e Tecnologica – Trento, ITALY {giuliani,mich,nardon}@itc.it Abstract In this paper, we report on a study whose objectives were first to investigate the interaction of children with a multimedia system designed for learning a foreign language and featuring automatic speech recognition functionality. Second, we investigated if, using such a system, users meet some particular difficulties in dealing with the microphone or in solving tasks which implies voice interaction. The study was based on the Italian version of “Tell me More. Kids: The City”, a multimedia system dedicated to children aged 8 to 10. The results of the study indicate that, when learners were involved in tasks where they could freely decide if and how many times to use the vocal interaction modality, they did not hesitate to use this modality. The study also revealed the importance of the form of feedback chosen to report the system’s judgment on the learner’s pronunciation to her. 1. Introduction In recent years, due to the rapid advances of speech recognition technology, a number of systems for computer assisted language learning have been proposed that support interactive speaking practice [1,2,3]. Some of them are explicitly conceived for children [4,5,6]. In this work we investigate the interaction of children with a multimedia system designed for learning a foreign language that features automatic speech recognition functionality, in particular pronunciation assessment capability. Moreover, we investigated if children meet some particular difficulties dealing with the microphone or in solving tasks which entails voice interaction with computer. The study was based on the use of Italian version of Tell Me More. Kids: the City, by Auralog [5]. 2. Experimental environment Two systems were initially considered: Happy English by Editori Riuniti [4], based on the IBM VIA Voice speech recognizer, and Tell me More Kids: the City by Auralog [5], based on proprietary speech recognition technology. Three teachers tried both the systems. Considering their observations and our experience with both the systems, we decided to use the Auralog’s one. In order to have homogeneous experimental results, the children had to consider the same four tasks among those available, and to carry them out in an order defined randomly for each child. The four considered tasks were: task 1 - Library, the user can choose among 12 words, which ones to listen and/or to utter, having feedback about the goodness of her word pronunciation. Three types of feedback are proposed: the waveform of the uttered word is displayed, for comparison purposes, together with the waveform of the reference utterance of the system; the expression (sad or happy) and the colour (green or red) of an animated character, a parrot, change according to the degree of goodness of the learner’s pronunciation; the expression (more or less smiling) and the position (craning his neck) of a clown change according to the system’s judgment; task 2 - Hidden Words whose objective is to find 8 words hidden in a square full of characters (10 x 10) and repeat their pronunciation. For each word, the user had text, a picture and audio; task 3 - Story, whose objective is to re-order four pictures representing a story and then repeat the sentence associated with each picture; task 4 - Color whose objective is to color a picture, uttering the appropriate color for each one of the 12 parts. 25 subjects, 15 females and 10 males, aged from 8 to 12 (average 10.3), participated in the experiment. Children were classified according to their English expertise: 15 beginners, who have studied English for 3, 4, or 5 years, and 10 starters, with 0, 1, or 2 years English study. The experiment followed a 4 (task) x 2 (English expertise) x 4 (task order) mixed design (both between- subject and within-subject factors). 3. The study results Because the Library task is completely different in objectives and modality of execution from the other three, we decided to analyze it separately. Regarding the relation between the number of listening per word and the English Proceedings of the The 3rd IEEE International Conference on Advanced Learning Technologies (ICALT’03) 0-7695-1967-9/03 $17.00 © 2003 IEEE