Construction and Analysis of Word-level Time-aligned Simultaneous Interpretation Corpus Takahiro Ono 1 , Hitomi Tohyama 2 , Shigeki Matsubara 2 1 Graduate School of Information Science, Nagoya University 2 Information Technology Center, Nagoya University Furo-cho, Chikusa-ku, Nagoya-shi, 464-8601, Japan {ono, hitomi, matubara}@el.itc.nagoya-u.ac.jp Abstract In this paper, quantitative analyses of the delay in Japanese-to-English (J-E) and English-to-Japanese (E-J) interpretations are described. The Simultaneous Interpretation Database of Nagoya University (SIDB) was used for the analyses. Beginning time and end time of each word were provided to the corpus using HMM-based phoneme segmentation, and the time lag between the corresponding words was calculated as the word-level delay. Word-level delay was calculated for 3,722 pairs and 4,932 pairs of words for J-E and E-J interpretations, respectively. The analyses revealed that J-E interpretation have much larger delay than E-J interpretation and that the difference of word order between Japanese and English affect the degree of delay. 1. Introduction Simultaneous interpretation (SI) is one modes of interpreta- tion where the interpreter renders the message in the target language while the source-language speaker continuously speaks, and it is widely used in the international society for its inherent advantages; it has superb time efficiency and rarely disturbs the source-language speaker. Although the SI interpreter and the speaker speak in parallel, the inter- preter’s utterances always delay behind the speaker’s utter- ances to grasp the speaker’s message. Since large delay burdens the interpreter’s memory, which could lower the interpretation quality (Mizuno, 2005), it is essential for in- terpreters to control the delay properly. The delay is heavily affected by the source and target lan- guages. Because Japanese and English have quite different word order, it is considered that Japanese-to-English (J-E) and English-to-Japanese (E-J) interpretations are difficult. However, few quantitative analyses have been conducted for the interpretations. In this paper, the quantitative analyses of the delay in J-E and E-J interpretations are discussed. The Simultaneous In- terpretation Database of Nagoya University (SIDB) (Mat- subara et al., 2002) was used for the analyses. We utilized word-level delay to observe the delay inside utterances. To measure the delay efficiently, word-level temporal informa- tion and translation correspondences were estimated for the SIDB. The analyses revealed the J-E interpretation’s large delay and other delay characteristics of J-E and E-J inter- pretations. 2. Corpus The Simultaneous Interpretation Database of Nagoya Uni- versity (SIDB) (Matsubara et al., 2002) was used in this research. The corpus consists of monologue data (lectures) and dialogue data, and they are accompanied with J-E and E-J interpretations. A part of monologue data was used for the analysis. The statistics of the data used is shown in Ta- ble 1 and 2. Table 1: Statistics of Japanese lectures and J-E interpreta- tions Lecture Interpretation # of lectures 8 13 # of utterance units 3,864 7,461 # of words 24,415 30,026 # of distinct words 2,414 2,976 Table 2: Statistics of English lectures and E-J interpreta- tions Lectures Interpretation # of lectures 12 20 # of utterance units 4,103 7,603 # of words 20,995 44,792 # of distinct words 3,225 3,146 Speaker Interpreter Speaker Interpreter Figure 1: Recording environment of SIDB Interpreter’s speech is recorded in the environment almost similar to the real one; sitting in a sound-proof booth, the interpreter speaks into a microphone, while clearly seeing and hearing the speaker via earphones. The speaker could not hear the interpreter’s speech so that he/she could speak in his/her own pace. Figure 1 shows the recording envi- 3383