Combined cine- and tagged-MRI for tracking landmarks on the tongue surface Honghao Bao 1 , Wenhuan Lu 1 , Kiyoshi Honda 2,* , Jianguo Wei 1 , Qiang Fang 3 , Jianwu Dang 2, 4 1 School of Computer Software, Tianjin University 2 School of Computer Science and Technology, Tianjin University 3 Chinese Academy of Social Science 4 School of Information Science, JAIST, Japan bao3009218057@126.com, Jianguo@tju.edu.cn, khonda@sannet.ne.jp Abstract Magnetic resonance imaging (MRI) techniques have been a promising way in recent speech production studies, and dynamic magnetic imaging with repetitive or real-time MRI scans has been widely used to acquire motions of all the articulators and measure their deformation during speech. While MRI is capable to visualize the entire surfaces of those organs, it lacks landmarks for motion tracking that are available with other techniques such as magnetic sensing methods. One possible solution to have both surface contours and landmarks of the articulators is to combine different imaging techniques. In this paper, we propose a new method to add surface markers on the dynamic MRI of the tongue by combining cine- and tagged-MRI data together. To do so, analysis was done on the images from the two types of scans conducted in the same session. The intersection points of the tag lines with the tongue surface contour were extracted from tagged-MRI data and they were mapped onto cine-MRI data. After minimizing the minute mapping errors, the result showed that marker tracking on both oral and pharyngeal surfaces of the tongue was successful. Index Terms: MRI data standardization, surface, landmark, cine-MRI, tagged-MRI, registration 1. Introduction Speech is the most preferred way to communicate with each other. To convey a message, various linguistic sounds are produced by controlling the configuration of the vocal tract. The articulators determine the resonance characteristics of the vocal tract during speech production. While articulatory data provide a stream of information that underlies speech signals, visualization of the movement of the articulators is technically difficult. Since most of speech articulators lie in the human body, techniques that have been used to measure articulatory movement need customization as seen in previous work using x-ray microbeam system (X-ray) [1], electromagnetic articulography (EMA) [2] and ultrasonography (USG) [3]. These techniques are capable of capturing articulatory information at high sampling rates, even though they are often invasive. None of these modalities, however, offers the complete view of all the vocal-tract articulators at a sufficiently high spatial resolution, and articulation information is limited on the anterior half of the vocal tract. Recently, development of MRI techniques has allowed for examinations of the entire vocal tract during speech production and provides a powerful means for quantifying the configuration of the articulators, including morphological characteristics of speakers in conjunction with their articulation and acoustics. Among those, it is common to find single-modality studies [2, 4, 5], rather than multi-modality ones [6, 7]. The analysis of place of articulation using motion images has been a technical issue in experimental phonetics because effective methods for motion analysis are not available in the data. Partly because of this problem, many experimental studies on speech articulation have employed techniques to track markers that are placed on the articulators’ surface, as seen in the studies with the X-ray microbeam system or magnetic sensing system. However, those techniques only measure the oral surface of the tongue among the hidden organs, and information from the pharyngeal surface is left unmeasurable. In contrast, MRI motion imaging excels at imaging the entire shape of the tongue surface, while it lacks the functionality for marker tracking. As a result, image analysis of the tongue surface has often been limited to the classical method to track the highest point of the tongue. This paper proposes a solution to realize tongue-surface motion analysis using the combined technique of synchronized cine- and tagged-MRI. The cine-MRI [8] is good at visualizing each component of the system during speech. The tagged-MRI [7] is one of the motion imaging techniques to track tissue deformation by visualizing tag lines marked on the soft tissue. The tagged motion images can also provide surface marker points by detecting intersection points of the tag lines with the tongue surface outline. Then, those marker points can be mapped onto the cine-MRI data frame-by-frame to obtain motion images of the articulators with marker points on the tongue. This paper is organized as follows. Section 2 introduces the related material and methods. Section 3 describes the results obtained with some remarks. The conclusion is in Section 4. 2. Materials and methods 2.1. Subjects and stimuli MRI datasets from two subjects (23-year-old male, and 24-year-old female) were used in this study. Both subjects are native Mandarin speakers and reported no history of speech or language disorders. Each speaker took the supine posture in the MR scanner, and the speaker’s head is padded with foam rubber to minimize head movement. The participants repeated the utterances of two-syllable Chinese words, such as /midu/ or /mune/, during data acquisition. 2.2. Data acquisition Both cine-MRI and tagged-MRI datasets were obtained during the same scan session using the Siemens Verio 3T installed at * Corresponding author Copyright  2015 ISCA September 6 - 10, 2015, Dresden, Germany INTERSPEECH 2015 359 10.21437/Interspeech.2015-155