NOISE-ROBUST F0 ESTIMATION USING SNR-WEIGHTED SUMMARY CORRELOGRAMS
FROM MULTI-BAND COMB FILTERS
Lee Ngee Tan and Abeer Alwan
Department of Electrical Engineering, University of California, Los Angeles
{tleengee, alwan}@ee.ucla.edu
ABSTRACT
A noise-robust, signal-to-noise ratio (SNR)-weighted correlogram-
based pitch estimation algorithm (PEA) in which a bank of comb
filters operates in each of the low, mid, and high frequency bands
is proposed. Correlograms are obtained by applying autocorrela-
tions directly on the low-freq filterbank (FBK) output, and the out-
put envelopes of all 3 FBKs. An SNR-weighting scheme is used
for channel selection to yield a summary correlogram for each FBK.
These summary correlograms are averaged to obtain an overall sum-
mary correlogram, which is time-smoothed before peak extraction
is performed. The final pitch contour is obtained via dynamic pro-
gramming. The proposed PEA is evaluated on the Keele corpus with
additive white or babble noises. In comparison with widely-used
PEAs, the proposed PEA has the lowest overall gross pitch error
(GPE), especially in low SNR cases.
Index Terms— Pitch estimation, correlogram, multi-band,
comb filtering, noise-robustness
1. INTRODUCTION
Fundamental frequency (F0) or pitch information of voiced speech
is required for many speech applications. Although F0 estimation is
a well-researched topic, accurate F0 estimation in noise still poses a
challenge. Pitch estimation algorithms (PEAs) can be broadly classi-
fied into three categories: 1) time-domain, 2) frequency-domain, and
3) time-frequency-domain. Time-domain PEAs directly exploit a
signal’s temporal periodicity, which includes zero-crossing rate, av-
erage magnitude difference function (AMDF), and autocorrelation-
based methods [1–3]. Frequency-domain PEAs estimate F0 using
the signal’s short-time spectral harmonicity [4, 5]. Time-frequency
domain PEAs typically separate a signal into various frequency
bands, and then apply time-domain processing in each band. The
auditory-model correlogram-based PEA is a popular time-frequency
domain method inspired by Licklider’s duplex theory of pitch per-
ception [6]. The signal is first decomposed into multiple frequency
channels by an auditory filterbank to model the cochlear frequency
analysis function, for which the gammatone auditory filters [7] are
widely used [8–11]. Autocorrelation is then applied directly on ev-
ery channel’s output [10] or on its envelope. The latter is generally
done on mid and high frequency channels (with center frequencies
> 1 kHz) [8, 9], whose wide bandwidths allow the capturing of
multiple harmonics, resulting in signal envelopes that oscillate at
F0 (beats). Together, these multi-channel autocorrelations form the
correlogram, from which single, or possibly multiple F0 candidates
are derived. Correlogram-based perceptual PEAs can yield esti-
mates close to human’s perceived pitch for signals with a missing
Work supported in part by NSF and DARPA
fundamental, inharmonic complexes and noise tones [12]. Being a
multi-band approach, correlogram-based PEAs have the potential to
be noise-robust, especially in the presence of colored noise.
Signal processing schemes employing comb filters have also
been proposed for F0 estimation, especially in the presence of noise
and harmonic disturbances. A spectral comb analysis technique [5]
involving cross-correlation between the spectrum and spectral comb
function with teeth of decreasing amplitude, and variable teeth inter-
vals, gives more accurate F0 estimates than a cepstrum-based PEA
[13]. An adaptive comb filter was formulated in [14] for pitch es-
timation and harmonic enhancement in additive white noise. In the
presence of overlapping periodic signals, an F0-tuned comb filter has
been successfully applied to notch or enhance one of the sources, be-
fore performing F0 estimation on individual signals [15].
Motivated by the information richness present in the correlo-
gram representation, and the harmonic enhancement/suppression ca-
pability of comb filters, the multi-band comb FBK correlogram-
based PEA is proposed in this paper. Details on the proposed algo-
rithm can be found in Section 2. Section 3 describes the performance
evaluation criteria and setup, while Section 4 presents the results of
the proposed method in comparison to other PEAs. The findings are
summarized in Section 5.
2. PROPOSED METHOD
The block diagram in Fig. 1 summarizes the proposed PEA.
8 kHz Speech
Envelope extraction
Compute SNR-weighted summary correlograms
sR
low
sR
ev mid ,
sR
ev low ,
sR
ev high ,
sR
smooth
Compute smoothed overall summary correlogram
Peak-extraction and dynamic programming
F0 contour
8192-pt FFT
Mid-freq
comb FBK
(1 - 2 kHz)
Low-freq
comb FBK
(0 - 1 kHz)
High-freq
comb FBK
(2 - 3 kHz)
Compute
IFFT on selected channels
SNR (based on inter-harmonic noise)
and perform
Fig. 1. Block diagram of proposed pitch estimation algorithm.
Multi-channel outputs are indicated by bold arrows.
4464 978-1-4577-0539-7/11/$26.00 ©2011 IEEE ICASSP 2011