Data Collection and Evaluation of Speech Recognition for Motorbike Riders H. Tanaka, H. Fujimura, C. Miyajima, T. Nishino, K. Itou, and K. Takeda Graduate School of Information Science , Nagoya University h-tanaka@sp.m.is.nagoya-u.ac.jp Abstract Speech recognition should be as an eyes-free and hands- free interface. To realise this technology, we need to clar- ify acoustics in a helmet and determine how much high- level riding noise affects captured speech data. This pa- per describes the acoustics in a helmet and transfer func- tions of the microphone position. We constructed a data collection system and collected the speech data of motor- bike riders on city roads and express highways. Speech recognition experiments were conducted and we obtained a recognition rate high of 83.1%. 1. Introduction Riding a motorbike requires more care than driving a car. Even when a rider idles his/her motorbike, such as when he/she waits at a red light, button operations are incon- venient because the rider needs to remove his/her gloves to push the buttons. Therefore, for motorbike riders, an eyes-free and hands-free interface is required for operat- ing information appliances, such as a cellular phone and a route navigation system. Thus, speech recognition is a very important technology. This study investigated the feasibility of speech recognition for motorbike riders. On a motorbike, rid- ers are exposed directly to high-level noises such as wind noise, engine noise, and road noise. It is known that exposed noise level is varied by various factors such as speed, riding position, and helmets[1, 2]. In order to re- alize speech recognition on a motorbike, we first need to investigate how much such factors degrade conventional speech recognition performance. Riders must put on a helmet when they ride motor- bikes. Helmets are designed to reduce noise level, how- ever, we need to clarify how much this reduction con- tributes to speech recognition using microphones inside the side of helmets. Moreover, we need to investigate acoustics in a helmet, because a helmet has a very small cavity. In this paper, we measured acoustics in a helmet to determine microphone positions for collecting riders’ speech corpus. We then collected the speech data uttered by motorbike riders riding on a highway. We also provide an analysis of the corpus and the results of our speech recognition evaluation. 2. Acoustics in a Helmet Acoustic features inside a helmet have unique character- istics because a helmet has only a small cavity when a Figure 1: Microphone Position (actual positions are in- side). rider puts it on. Since a microphone must be located close to the mouth, directivity of a sound source, that is, the mouth, influences significantly the acoustic fea- ture between the sound source and the microphone. We determined an appropriate microphone position for rid- ers’ speech recognition by measuring the acoustic trans- fer function inside a helmet. Figure 1 shows ten positions for installing a small mi- crophone. Acoustic transfer functions between an arti- ficial mouth of a head-and-torso simulator (B&K 4128) and every microphone were measured by using the Swept Sine signal[3]. A full-face helmet ARAI RAPIDE-OR was used. Figure 2 shows the magnitude responses. The follow- ing seven positions are inappropriate when riders put on the helmet; the #1, #2, and #3 positions have no space to attach a microphone, the microphone sank into the buffer material for positions #6, #9, and #10, and the #7 micro- phone was blown strongly by the riders’ breath. All the remaining responses (#4, #5, and #8) differ due to reflection of proximity between the mouth and the microphone. Two sharp dips in #5 show that there are small cavities in a helmet; however, the responses fluctu- ated within -20 dB in the speech band. These responses were not good, but were adequate for speech recognition. Our recording system for the motorbike has only two input channels. We chose #4 and #8 for data collection, because they had better responses that were flatter and had no dips in the speech band. We refer to the #8 micro- phone as the mouth microphone and the #4 as the nose microphone.