arXiv:2111.03629v1 [cs.SD] 5 Nov 2021 OBJECTIVE MEASUREMENT OF PITCH EXTRACTORS’ RESPONSES TO FREQUENCY MODULATED SOUNDS AND TWO REFERENCE PITCH EXTRACTION METHODS FOR ANALYZING VOICE PITCH RESPONSES TO AUDITORY STIMULATION Hideki Kawahara ⋆ K. Yatabe † K. Sakakibara ‡ T. Kitamura ¶ H. Banno $ M. Morise § ⋆ Wakayama University, Japan * † Waseda University, Japan ‡ Health Science University of Hokkaido, Japan ¶ Konan University, Japan $ Meijo University, Japan § Meiji University, Japan ⋆ kawahara@sys.wakayama-u.ac.jp, † k.yatabe@asagi.waseda.jp, ‡ kis@hoku-iryo-u.ac.jp ¶ t-kitamu@konan-u.ac.jp, $ banno@meijo-u.ac.jp, § mmorise@meiji.ac.jp ABSTRACT We propose an objective measurement method for pitch extractors’ responses to frequency-modulated signals. The method simultane- ously measures the linear and the non-linear time-invariant responses and random and time-varying responses. It uses extended time- stretched pulses combined by binary orthogonal sequences. Our re- cent finding of involuntary voice pitch response to auditory stimu- lation while voicing motivated this proposal. The involuntary voice pitch response provides means to investigate voice chain subsystems individually and objectively. This response analysis requires reliable and precise pitch extraction. We found that existing pitch extractors failed to correctly analyze signals used for auditory stimulation by using the proposed method. Therefore, we propose two reference pitch extractors based on the instantaneous frequency analysis and multi-resolution power spectrum analysis. The proposed extractors correctly analyze the test signals. We open-sourced MATLAB codes to measure pitch extractors and codes for conducting the voice pitch response experiment on our GitHub repository. Index Terms— Voice chain, fundamental frequency, frequency modulation, time-stretched pulse, linear time invariant system 1. INTRODUCTION We reported that voice fundamental frequency (fo) 1 involuntary shows compensatory response to frequency modulation of auditory stimuli applied while voicing [2]. The experiment requires to mea- sure fo of periodic sounds 2 without introducing nonlinearities and glitches for measuring voice response to auditory stimulation. This requirement led us to propose an objective measurement method 3 of pitch extractors’ response to frequency-modulated tones 4 . We ∗ This work was supported by JSPS (Japan Society for the Promotion of Science) Grants-in-Aid for Scientific Research Grant Numbers JP18K00147, JP18K10708, JP19K21618, and JP21H04900. 1 We use fo (pronuciation “ef oh”) representing the fundamental fre- quency [1] instead of using conventional symbols such as F0. 2 Sounds without fundamental component, “missing fundamental,” is dif- ficult to extract fo. 3 The proposed pitch extractor measurement uses a new system response analysis method (CAPRICEP-based method afterward. It stands for Cas- caded All-Pass filters with RandomIzed CEnter frequencies and Phase po- larity [3]). The CAPRICEP-based method simultaneously measures the lin- ear time-invariant (LTI) response, the non-linear time-invariant (non-LTI) re- sponse, and random and time-varying responses. 4 Strictly, using “pitch” to represent fo [4] is misleading. What we per- ceive is “pitch,” a psychological attribute, which highly correlates with fo. However, we do not distinguish the use of pitch and fo in this article unless it introduces confusion. assigned the modulation signal as input and the extracted pitch value as output and fed them to the proposed pitch extractor measure- ment method. We found that existing fo extractors fail to meet this requirement. This issue motivated us to introduce reference pitch extractors. The contributions of this article are as follows. We introduced a new objective measurement method of pitch extractors which pro- vides useful supplemental information to existing evaluation mea- sures. We also introduced two reference pitch extractors. 2. BACKGOUND The first author introduced an altered auditory feedback technique 5 and reported that our voice pitch control consists of two responses to feedback pitch modification, the voluntary and the involuntary re- sponses a quarter-century ago [5, 6]. Despite decades of research, altered feedback still is a hot topic for investigating speech chain and underlying neural basis [7–11]. This research mainly focused on voluntary responses represented by the pitch shift paradigm and adaptation paradigm [12–14]. 6 Introduction of CAPRICEP-based method [3] and voice pitch response to auditory stimulation, which is not an altered feedback voice, opened a new possibility [2]. The combination of the method and response to non-feedback sounds solved difficulties in investigating the involuntary response [2,15]. The experimental procedure for measuring the voice pitch re- sponse to FM test sounds consists of the following steps. Generation of test signal: Combine orthogonal sequences made from extended time-stretched pulses followed by smoothing to yield the modulation signal. We applied frequency modu- lation to four types of signals. They are; SINE: a sinusoid, SINES: a sum of multiple harmonic sinusoids, MFND: a sum of multiple harmonic sinusoids without the fundamen- tal component, and MFUNDH: a sum of multiple harmonic sinusoids without the first eight harmonic components. Voicing with auditory stimulation: The subject keeps voicing with a constant pitch while listening to the test sound. The test system records the produced voice and the test signal for auditory stimulation simultaneously. Response analysis: Apply pulse recovering and orthogonalization procedure to recover the stimulation pulse from the test signal 5 We use the term “altered auditory feedback” here because it is common practice now. We used the term “transformed auditory feedback” to represent our paradigm a quarter-century ago. 6 For detailed historical background and discussions on altered feedback research and relation to CAPRICEP-based method, refer [2, 15].