Adaptive Post-Filtering Controlled by Pitch
Frequency for CELP-based Speech Coder
Hironobu Chiba
*
, Yutaka Kamamoto
†
, Takehiro Moriya
†
, Noboru Harada
†
Shigeki Miyabe
*
, Takeshi Yamada
*
and Shoji Makino
*
*
Graduate School of Systems and Information Engineering, University of Tsukuba, Japan
Email:chiba@mmlab.cs.tsukuba.ac.jp
†
NTT Communication Science Laboratories, Nippon Telegraph and Telephone Corporation, Japan
Abstract—Most speech codecs utilize a post-filter that empha-
sizes pitch structures to enhance perceptual quality at the decoder.
Particularly, the bass post-filter used in ITU-T G.718 performs an
adaptive pitch enhancement technique for a lower fixed frequency
band. This paper describes a new post-filtering method in which
the bass the frequency band and the gain are adaptively controlled
frame-by-frame depending on the pitch frequency of decoded
signal to improve bass post-filter performance. We have confirmed
the improvement of the speech quality with the developed method
through objective and subjective evaluations.
I. I NTRODUCTION
Speech codecs, especially CELP (Code Excited Lin-
ear Prediction)-based standard technologies, apply a post-
processing to the decoded signal in order to enhance the
perceptual quality [1], [2], [3]. One example of the processing
is a post-filter that emphasizes the formants and the pitch
structures [4], [5]. The ITU-T G.718 decoder, called the bass
post-filter for wideband speech sampled at 16 kHz, applies such
a post-filter to the lower frequency band, while conventional
post-filters have been applied up to the Nyquist frequency for
narrow band signal sampled at 8 kHz [6]. This paper presents
an adaptive control by which bandwidth for the post-filter is
applied depending on the observed pitch frequency (F
0
or
fundamental frequency). The conventional method implemented
in G.718 uses the fixed cut-off frequency to apply the post-
filtering. In contrast, the developed approach adaptively controls
the cut-off frequency frame-by-frame. To assess the quality
improvements, we conducted ITU-T P.862 PESQ (Percep-
tual Evaluation of Speech Quality) [7] and ITU-R BS. 1534
MUSHRA (MUlti Stimulus test with Hidden Reference and
Anchor) [8] experiments. The obtained test results show that
the developed method statistically enhances the decoded speech
signal. This method is applicable to most speech codecs based
on CELP.
II. PROCESSING OF BASS POST- FILTER
Figure 1 shows a block diagram of the bass post-filter used
in G.718. This post-filter emphasizes the pitch structure of the
decoded signal, and ˆ s(n) is limited to the lower frequency band
for each 10-ms frame (hereafter referred to as a processing
frame).
First, s
p
(n) is calculated by the following equation.
s
p
(n)=0.5ˆ s(n − τ )+0.5ˆ s(n + τ ) (1)
Fig. 1. Block diagram of bass post-filter used in G.718.
where τ is the pitch period obtained by the pitch tracking
part. This equation means that the emphasized signal, s
p
(n),
is generated from a two-sided long-term prediction. In order to
allow proper operation of the pitch prediction of equation (1)
in all cases, the unavailable data is extrapolated according to
the following rule:
ˆ s(n + L)=ˆ s(n + L − τ ) (2)
Here, L is set to a proper value needed by the pitch prediction.
Second, the intermediate signal s
f
(n) and the pitch-
emphasized over the broad-band frequency signal r(n) is
obtained from the following procedures:
r(n) = ˆ s
f
(n) − ˆ s(n) (3)
ˆ s
f
(n) = (1 − α)ˆ s(n)+ αs
p
(n) (4)
Here, the gain factor α is given by
α =
C
p
0.5(E
p
+ 10
0.1
¯
Epp
)
(5)
where C
p
is the inner product between ˆ s(n) and s
p
(n), E
p
is the energy of the predicted signal s
p
(n), and
¯
E
pp
is the
mean prediction error of energy in decibels in the present 5-ms
coding frame. Note that α, which is computed by equation (5),
is constrained by the following condition.
α =
{
0.5 (α> 0.5)
0.0 (α< 0.0)
(6)
838 978-1-4799-8297-4/14/$31.00 ©2014 IEEE Asilomar 2014