Synthesis of the laryngeal source of throat singing using a 2×2-mass model Ken-Ichi Sakakibara ∗1 , Hiroshi Imagawa ∗2 , Seiji Niimi ∗3 , Naotoshi Osaka ∗1 kis@brl.ntt.co.jp, imagawa@m.u-tokyo.ac.jp, niimi@iuhw.ac.jp, osaka@brl.ntt.co.jp ∗1 NTT Communication Science Laboratories 3-1, Morinosato Wakamiya, Atsugi-shi, Kanagawa, 243-0198, Japan ∗2 Department of Speech Physiology, The University of Tokyo 7-3-1, Hongou, Bunkyo-ku, Tokyo, 113-0033, Japan ∗3 Speech and Hearing Center, International University of Health and Welfare 2600-6, Kitakanemaru, Ohtawara, 324-0011, Japan Abstract Singing voices have various timbres. Throat singing and some other Asian traditional singing voices have a pressed timbre that is significantly different from the European clas- sic singing voice. In our previous study on throat singing, the vibration of the false vocal folds as well as that of the vocal folds was observed and was found to be essentially due to the pressed timbre. This paper describes a 2×2-mass model as a physical model, defines an adduction parameterization of its parameters, and presents a simulation of vocal fold and false vocal fold vibrations in the larynx. Furthermore, a visual simulator of the laryngeal movements is demonstrated. By using this model, the vibration patterns of the two different laryngeal voices in throat singing (the squeezed and kary- graa voices) and the normal pressed voice have been simu- lated. The results show the possibility of synthesis of various timbres for singing. 1 Introduction The singing voice has numerous variations of tim- bre. There are considerable differences, for instance, between European classical singing voice, such as bel canto and German lied, and the Asian traditional pressed singing voices, such as throat singing, Japanese Youkyoku, and Korean Pansori. The laryngeal source is an essential factor in deter- mining the timbre of the singing voice, especially for pressed quality. In general, the pressed quality is ob- tained by excessive adduction of the supraglottal struc- ture. The laryngeal adjustments in Asian traditional pressed singing are much different from that in Euro- pean classic singing [5, 6, 9]. Synthesizing such varying timbres in singing voices requires a flexible laryngeal source model. A glottal waveform model allows us to control its param- eters to approximate the perception of voice [8, 10]. On the other hand, a physical model allows us to con- trol its parameters according to the physical and physi- ological mechanism of laryngeal adjustment. Based on the physiological observations, we have constructed a 2×2-mass model as a physical model which is devised by attaching a two-mass for the false vocal fold to or- dinary two-mass model for the vocal folds [3, 10]. In this paper, after summarizing the physiologi- cal observations in throat singing, we describe the mechanism of a 2×2-mass model and its adduction parametrization. We also present a visual simulation tool for the model. Finally, using the model, we sim- ulate the laryngeal sources of throat singing and the normal pressed voice. 2 Laryngeal source in throat singing 2.1 Throat singing Throat singing is a traditional singing style of peo- ple who live around the Altai mountains. Kh¨ o¨ omei in Tyva and Kh¨ o¨ omij in Mongolia are representative styles of throat singing. Throat singing is sometimes called biphonic singing, or overtone singing because two or more distinct pitches (musical lines) are pro- duced simultaneously in one tone. One is a low sus- tained fundamental pitch, called a drone, and the sec- ond is a whistle-like harmonic that resonates high above the drone. The production of the highly pitched overtone is mainly due to the pipe resonance of the cavity from the larynx to the point of articulation in the vocal tract [1]. On the other hand, the laryngeal voice of throat singing has special pressed timbre and supports the generation of the overtone. The laryngeal voices of throat singing can be clas- sified as squeezed and kargyraa based on the lis- tener’s impression, acoustical characteristics, and the singer’s personal observation on voice production. The squeezed voice is the basic laryngeal voice in throat singing and used as drone. The kargyraa voice is a very low pitched voice that ranges out of the modal register. 2.2 False vocal folds The false vocal folds (ventricular folds) are a pair of soft and flaccid folds which attach to anterolateral sur- face of the arytenoid cartilages (Fig. 1). While the vo- cal folds (VFs) have a mechanism that change the stiff- ness, thickness, and longitude by the muscles (mainly by the action of thyroarytenoid muscle), the false vocal