Cepstrum-based estimation of resonance frequencies (formants) in high-pitch singing signals C. Zarras 1 , K. Pastiadis 2 , G. Papadelis 3 , G. Papanikolaou 4 1 Aristotle University of Thessaloniki, 54124 Thessaloniki, E-Mail: chzarras@auth.gr 2 Aristotle University of Thessaloniki, Dept. of Music Studies, 540 06 Thessaloniki, E-Mail: pastiadi@mus.auth.gr 3 Aristotle University of Thessaloniki, Dept. of Music Studies, 540 06 Thessaloniki, E-Mail: papadeli@mus.auth.gr 4 Aristotle University of Thessaloniki, 54124 Thessaloniki, E-Mail: pap@eng.auth.gr Abstract The estimation of the vocal tract resonance frequencies from acoustic voice signals has been widely employed and various methods have been proposed. Among them, a number of cepstrum based techniques have been implemented to disentangle the voice’s spectral envelope from the harmonic components. Noticebly less research has been conducted for voices with higher fundamental frequency, as in singing (e.g. soprano voices). In such cases, the estimation of the spectral envelope is affected by the presence of cepstral rahmonics, which are interleaved with spectral envelope estimation. In this paper, some new techniques based on cancellation of rahmonics, rather than hard liftering, are proposed and examined for their effectiveness in maintaining the spectral envelope information. Both straightforward implementations and iterative procedures are considered and simulation results for various configurations of f 0 and formant frequencies are presented. These preliminary examinations allow the evaluation of effects of various acoustical and signal processing factors on estimation accuracy and assess the feasibility of the proposed approaches for use with high fundamental frequency signals, such as singing, and in other similar fields of interest in musical acoustics. Introduction Formants are defined as the resonance frequencies of the stomatopharyngeal filter, or in other words the local maxima of the filter’s transfer function. Formant frequency estimation is a common task in speech processing and is of great interest due to its application to various fields such as speech encoding, speech and speaker recognition. For harmonic speech signals with low fundamental frequency, harmonics are close enough to each other and formants can be obtained through spectral envelope’s peaks. In contrary, when f0 exceeds a high value, harmonic distance increases and estimating an accurate representation of the stomatopharyngeal filter’s transfer function, becomes more complicated. In such cases, spectral maxima and formants are not necessarily coinciding. This problem is apparent in women and children’s singing signals and even more in soprano voices whose fundamental frequencies exceed 1 kHz. A number of methods have been proposed for the estimation of formants. Among them, the LPC method [1] which is based on linear prediction of speech, as well as some cepstrum-based techniques [2] which are based on the signal’s real cepstrum. However, for both types of methods, there are disadvantages which become apparent as fundamental frequency increases[3]. Formants estimated with LPC tend to follow the spectral peaks, i.e. the harmonics, ignoring the true vocal tract resonances. From the other hand, in high pitch signals, cepstral rahmonics coexist at the lower part of the cepstrum with spectral envelope information making the disentanglement rather difficult. Other methods based on 2D time- frequency representations [4] or the true envelope approach [5] have also been proposed. In this preliminary study, a new approach in cepstral liftering is presented and tested with synthetic voice signals. The aim is to minimize the cepstrum drawbacks when used with high pitched voice signals. A new Voice Generation Application was developed for the tests based on the LF model [6]. The test results and prospects of further investigation are discussed. Methods and Results The limited usage of cepstrum-based algorithms in high pitch singing signals is due to the characteristics of the cepstral representation. The main principle of these methods is that the spectral envelope information is mainly located at low order cepstral coefficients (quefrencies), while harmonics information goes up to higher quefrencies. This makes their disentanglement easy. As the fundamental frequency of the tested signal increases, the distance between the first rahmonic and the start of the cepstrum and between the rahmonics each other decreases. Given that the lower quefrencies are necessary for the extraction of spectral envelope information, setting cepstral coefficients at rahmonic positions to zero is needed. Consequently, the presence of rahmonics in that low area and their cancellation distorts the envelope. Our approach relies on the fact that lower rahmonics are more affected by spectral envelope information than higher ones. After all, this is the main reason why the lowest part of the cepstrum is used for the calculation of the smoothed spectral envelope. Therefore, one of the latest rahmonics, which are considered “clearer” i.e. their value is less affected by the filter’s transfer function can be subtracted from the first rahmonics, in order to minimize the harmonics information in the final smoothed spectral envelope representation. The algorithm requires a decent estimation of pitch frequency as a first step, since rahmonics have to be located. DAGA 2010 - Berlin 661