CONTROL ID: 3008913 TITLE: Computer-vision analysis shows different facial movements for the production of different Mandarin tones AUTHORS (FIRST NAME, LAST NAME): Saurabh Garg1, Lisa Tang5, Ghassan Hamarneh3, Allard Jongman2, Joan A. Sereno2, Yue Wang4 INSTITUTIONS (ALL): 1. Pacific Parkinson's Research Centre , University of British Columbia, Vancouver, BC, Canada. 2. Department of Linguistics, University of Kansas, Kansas City, KS, United States. 3. School of Computer Science, Simon Fraser University, Burnaby, BC, Canada. 4. Department of Linguistics, Simon Fraser University, Burnaby, BC, Canada. 5. School of Computer Science, Simon Fraser University, Buranby, BC, Canada. ABSTRACT BODY: Abstract (200 words): We aim to identify visual cues resulting from facial movements made during Mandarin tone production and examine how they are associated with each of the four tones. We use signal processing and computer vision techniques to analyze audio-video recordings of 21 native Mandarin speakers uttering the vowel /ɜ/ with each tone. Four facial interest points were automatically detected and tracked in the video frames: medial point of left-eyebrow, nose tip (proxy for head movement), and midpoints of the upper and lower lips. Spatiotemporal features were extracted from the positional profiles of each tracked point. These features included distance, velocity, and acceleration of local facial movements with respect to the resting face of each speaker. Analysis of variance and feature importance analysis based on random decision forest were performed to examine the significance of each feature for representing each tone and how well these features can individually and collectively characterize each tone. Preliminary results suggest alignments between articulatory movements and pitch trajectories, with downward or upward head and eyebrow movements following the dipping and rising tone trajectories, faster lip-closing toward the end of falling tone production, and minimal movements for the level tone.