multimodal speech data acquisition with the use of EMA, fast-speed video cameras and a dedicated microphone array Mik Łukasz, Robert Wielgat, Daniel Król, Rafał Jędryka State Higher Vocational School in Tarnów Polytechnic Institute Tarnów, Poland l_mik@pwsztar.edu.pl Anita Lorenc Maria Curie-Skłodowska University Department of Speech Therapy and Applied Linguistics Lublin, Poland Radosław Święciński Amsterdam University of Applied Sciences International Business School Amsterdam, Netherlands Abstract—Electromagnetic Articulography (EMA) is one of the methods of examining tongue movements. This technique is usually supported by audio recordings and may be supplemented by video data in order to analyze external articulator movements. In the research presented here, a new approach using a microphone array and synchronous video recordings is proposed and an examination of Polish nasal consonant is presented as an example of this novel method of analysis. Keywords—electromagnetic articulography, acoustic camera, speech analysis I. INTRODUCTION Electromagnetic Articulography (EMA) is a technology created over two decades ago [1] and has been developed ever since. EMA enables one to acquire spatiotemporal data from sensors placed on the tongue in order to obtain information that reflects the positioning of the tongue, its shape and dynamics during vocalizations of various sounds of human speech. The articulograph is often supported by an audio recorder [2] and a vision system [3-7]. Such video recording usually involves 2 or 3 cameras registering the image of a speaker’s face. In order to track precisely the movements of external articulators (eg. lips, jaws, cheeks), markers are often placed on the face of the speaker [3-5,7]. Simultaneously with the registration of signals from EMA sensors, acoustic data are recorded. Usually only one audio channel is recorded. It should be noted that simultaneous registration of EMA, audio and visual data is not often encountered in the literature. In this paper, a novel system integrating EMA, audio and visual data recording is presented. The articulatory data was obtained with a Carsten's AG500 articulograph. The vision system was constructed from 3 high-speed cameras (Gazelle GZL-CL-22C5M-C) manufactured by Point Grey. The cameras registered movements of OptiTrack reflective markers attached to the face of the speakers. The audio recorder consisted of a 16-channel microphone array and an electronic device that registered and processed signals from the microphones. The microphone array made it possible to map sources of sound propagation on the speaker’s face [12-14]. Such an audio system was not used in speech research before. II. SYSTEM DESCRIPTION A block diagram of the whole system is presented in Figure 1. The simultaneous recording of signals from EMA, the video system and the audio recorder is controlled from a computer with a host program and is supported by a synchronizer. The electromagnetic articulograph registers signals from EMA sensors which return their spatiotemporal positions with the sampling frequency of 200 Hz. Fig. 1. Block Diagram of measurement system.