The Voice Fundamental Frequency Statistical Parameters under Noisy Conditions with the Cepstrum Method Jovan Galić 1 , Tatjana Pešić-Brđanin 2 Abstract An influence of white noise on basic statistical parameters (the mean, the standard deviation, the median and the mode) of the speech signal fundamental frequency is analyzed in this paper. Fundamental frequency determination is done using MATLAB software, with the cepstrum method. It is shown that the median can be used as the best estimator of the fundamental frequency. Keywords – Fundamental frequency, cepstrum, statistical parameters I. INTRODUCTION In the era of explosive development of information technologies, the speech communication is still the most natural and most convenient way to communicate. In recent years there has been a great expansion in research, development and application of speech technologies (interactive voice response and portals, process management, automatic translation from one language to another, voice authentication, etc.) [1]. One of the most important individual acoustic characteristics of the speaker is the voice fundamental frequency (F 0 ). The fundamental frequency of the voice is not stable parameter within the speakers, i.e. every person does not have an exact, fixed value of the voice fundamental frequency. It changes during the speech, which means that this term implies some mean value, usually the arithmetic mean. Many different methods can be used to determine the fundamental frequency of the speech signal [2,3]. It is very important to accurately determine the fundamental frequency because of the high sensitivity of the human perceptual mechanism to its value. Under the non- stationary nature of the voice fundamental frequency, other statistical parameters related to it (extent of change, standard deviation, statistical distribution, etc.) must be taken into consideration. It is particularly important to determine the fundamental frequency if the speech signal, to some extent, is masked by noise. There are several widely used methods to estimate the fundamental frequency, the auto-correlation, 1 Jovan Galić is with the Faculty of Electrical Engineering, University of Banja Luka, Patre 5, 78000 Banja Luka, Republika Srpska, Bosnia and Herzegovina E-mail: jgalic@etfbl.net 2 Tatjana Pešić-Brđanin is with the Faculty of Electrical Engineering, University of Banja Luka, Patre 5, 78000 Banja Luka, Republika Srpska, Bosnia and Herzegovina E-mail: tatjanapb@etfbl.net the cross-correlation and the cepstrum method [2,3]. In our previous work [4], we have shown that cepstrum method has the ability of more precise estimation of the fundamental frequency, in speech signal masked by white noise, than the other two methods. The cepstrum method is used in this paper as the method for determination the fundamental frequency. Cepstrum is a result of Fourier transform (FT) of a decibel spectrum. The word cepstrum was derived by reversing the first four letters of "spectrum". Power cepstrum is defined as: { } ( ) { } 2 2 log signal F F (1) It is often described by an algorithm: Signal Ō FT Ō abs() Ō ŋ 2 Ō log Ō FT Ō abs() Ō ŋ 2 Ō cepstrum Due to the properties of logarithmic functions, a product in the spectral domain is transformed into the sum in the cepstrum domain. Cepstrum of the voiced phoneme has a strong local peak corresponding to the fundamental period [5]. The cepstrum method is characterized by great accuracy and great numerical complexity [5]. The main aim of this paper is to investigate the accuracy of determining the speech signal fundamental frequency and its statistical parameters in the presence of white noise with the cepstrum method. The most important statistical parameters of the fundamental frequency are taken as measures for the estimation: the mean, the standard deviation, the median and the mode. In this study, more extensive statistical sampling than in [4] will be analyzed to determine the statistical parameters. There will also be examined which statistical parameter of the fundamental frequency is the most accurate parameter for estimation. II. EXPERIMENT A. The experiment conditions As a material for analysis, sound recordings of ten female and ten male speakers (aged 20 to 35 years) are used. Sound Forge software package [6] is used for recording. It is done in laboratory conditions in the presence of normal ambient noise (external noise levels up to 45 dB (A)), with the integrated 16- bit sound card and the sampling frequency of 22.05 kHz. The speakers uttered the vowel /a/ in Serbian. A segment with stable fundamental frequency contour for half a second is selected for analysis. MATLAB software package [7] is used to determine the fundamental frequency by the cepstrum method, with the frame window length 1024 samples (46,4 ms) and the offset 256 samples (11,6 ms).