Adaptive Post-Filtering Controlled by Pitch Frequency for CELP-based Speech Coder Hironobu Chiba * , Yutaka Kamamoto † , Takehiro Moriya † , Noboru Harada † Shigeki Miyabe * , Takeshi Yamada * and Shoji Makino * * Graduate School of Systems and Information Engineering, University of Tsukuba, Japan Email:chiba@mmlab.cs.tsukuba.ac.jp † NTT Communication Science Laboratories, Nippon Telegraph and Telephone Corporation, Japan Abstract—Most speech codecs utilize a post-ﬁlter that empha- sizes pitch structures to enhance perceptual quality at the decoder. Particularly, the bass post-ﬁlter used in ITU-T G.718 performs an adaptive pitch enhancement technique for a lower ﬁxed frequency band. This paper describes a new post-ﬁltering method in which the bass the frequency band and the gain are adaptively controlled frame-by-frame depending on the pitch frequency of decoded signal to improve bass post-ﬁlter performance. We have conﬁrmed the improvement of the speech quality with the developed method through objective and subjective evaluations. I. I NTRODUCTION Speech codecs, especially CELP (Code Excited Lin- ear Prediction)-based standard technologies, apply a post- processing to the decoded signal in order to enhance the perceptual quality [1], [2], [3]. One example of the processing is a post-ﬁlter that emphasizes the formants and the pitch structures [4], [5]. The ITU-T G.718 decoder, called the bass post-ﬁlter for wideband speech sampled at 16 kHz, applies such a post-ﬁlter to the lower frequency band, while conventional post-ﬁlters have been applied up to the Nyquist frequency for narrow band signal sampled at 8 kHz [6]. This paper presents an adaptive control by which bandwidth for the post-ﬁlter is applied depending on the observed pitch frequency (F 0 or fundamental frequency). The conventional method implemented in G.718 uses the ﬁxed cut-off frequency to apply the post- ﬁltering. In contrast, the developed approach adaptively controls the cut-off frequency frame-by-frame. To assess the quality improvements, we conducted ITU-T P.862 PESQ (Percep- tual Evaluation of Speech Quality) [7] and ITU-R BS. 1534 MUSHRA (MUlti Stimulus test with Hidden Reference and Anchor) [8] experiments. The obtained test results show that the developed method statistically enhances the decoded speech signal. This method is applicable to most speech codecs based on CELP. II. PROCESSING OF BASS POST- FILTER Figure 1 shows a block diagram of the bass post-ﬁlter used in G.718. This post-ﬁlter emphasizes the pitch structure of the decoded signal, and ˆ s(n) is limited to the lower frequency band for each 10-ms frame (hereafter referred to as a processing frame). First, s p (n) is calculated by the following equation. s p (n)=0.5ˆ s(n − τ )+0.5ˆ s(n + τ ) (1) Fig. 1. Block diagram of bass post-ﬁlter used in G.718. where τ is the pitch period obtained by the pitch tracking part. This equation means that the emphasized signal, s p (n), is generated from a two-sided long-term prediction. In order to allow proper operation of the pitch prediction of equation (1) in all cases, the unavailable data is extrapolated according to the following rule: ˆ s(n + L)=ˆ s(n + L − τ ) (2) Here, L is set to a proper value needed by the pitch prediction. Second, the intermediate signal s f (n) and the pitch- emphasized over the broad-band frequency signal r(n) is obtained from the following procedures: r(n) = ˆ s f (n) − ˆ s(n) (3) ˆ s f (n) = (1 − α)ˆ s(n)+ αs p (n) (4) Here, the gain factor α is given by α = C p 0.5(E p + 10 0.1 ¯ Epp ) (5) where C p is the inner product between ˆ s(n) and s p (n), E p is the energy of the predicted signal s p (n), and ¯ E pp is the mean prediction error of energy in decibels in the present 5-ms coding frame. Note that α, which is computed by equation (5), is constrained by the following condition. α = { 0.5 (α> 0.5) 0.0 (α< 0.0) (6) 838 978-1-4799-8297-4/14/$31.00 ©2014 IEEE Asilomar 2014